10 Okt 2008
Search Engines Extracting Table Data on the Web
The Web is filled with page after page after page of data. That data is usually organized differently from one site and one page to another, and contained in text, in pictures, in videos, in audio, in columns, in rows, in frames, and many other formats.
When a search engine spider comes to a page on the Web, it will try to go through all of the text it finds, make note of links to other pages, consider alt text for images, and view meta data tags.
Search engines spiders will decide whether or not the content of pages should be indexed by the search engine, and determine which links to follow next.
Sometimes search engine spiders will pick out part of a page to treat a little differently for one reason or another. It might extract specific types of information, or look for data in specific formats. For instance, Google might find a list on a page, and send information about the list to the data base for Google Sets. I wrote about some of the details in a post about the Google Sets patent.
How Search Engines May Use Images to Rank Web Pages
Images on a web page can provide a chance to express ideas in a visual way that can convey a considerable amount of information, and may also add to the attractiveness and perceived quality of a site.
When search engines rank pages in search results, images may have some impact in those rankings.
A search engine might look at the captions associated with pictures, or alt text provided as an alternative for when people browse the Web without images turned on or when those browsers are using screen reading software.
Search engines might also look at text surrounding an image, especially within the same HTML container, or block or segment.
Those indexing services could also associate other content on a page with an image, including the page’s title.
Where’s The Online TV Search?
As consumers, we should be thrilled the TV networks have started delivering nice, deep inventories of video clips and longer-form videos online. That’s great until you want to explore the current and archived stuff on each domain.
Maybe I should cut some slack to our beloved cable and broadcast networks, who are used to having audiences find their fare through on-air guides and remote controls. After all, TV audiences don’t conduct free-form searches to find shows. But I don't think any video providers deserve this break.
So where’s the online TV search? Last week, I asked many Future TV Show 2008 attendees about findability matters.
Their responses were very interesting, at least to me. While I won’t name names, I heard several executives flatly say their site searching and browsing capabilities were terrible. The rest I would classify as apathetic, which probably comes from years of limited options and lack of control.
Yet these TV networks are not different online, and sound like other web publishers these days. They are paying attention to acquiring and keeping visitors on their domains.
At the NYC show, I heard many familiar questions: How do I get people to my site? How can I get more video streams and page views? How can I really make money online? How do I measure our success?
We’re all learning that online video isn’t exactly the same animal as its on-air cousin. There are differences in terms of consumption patterns, for starters. The destinations that succeed will learn how to engage and optimize their new online audiences through effective video search, discovery and sharing mechanisms.