10 Okt 2008

Search Engines Extracting Table Data on the Web

The Web is filled with page after page after page of data. That data is usually organized differently from one site and one page to another, and contained in text, in pictures, in videos, in audio, in columns, in rows, in frames, and many other formats.

When a search engine spider comes to a page on the Web, it will try to go through all of the text it finds, make note of links to other pages, consider alt text for images, and view meta data tags.

Search engines spiders will decide whether or not the content of pages should be indexed by the search engine, and determine which links to follow next.

Sometimes search engine spiders will pick out part of a page to treat a little differently for one reason or another. It might extract specific types of information, or look for data in specific formats. For instance, Google might find a list on a page, and send information about the list to the data base for Google Sets. I wrote about some of the details in a post about the Google Sets patent.

0 komentar: