By Danny Sullivan, April 17, 1996
Search engines are one of the primary ways that Internet users find web sites. Every day, search engines "crawl" the web: they visit web sites, then store the text of web pages they find into giant catalogs.
Those using search engines enter a few keywords, push the "submit" buttons and wait while the search engines check their catalogs for web pages that seem to best match the keywords. Usually, hundreds or thousands of matching web pages are found. In most cases, the 10 most "relevant" matches are displayed first, and users can then choose to see more results if they don't find what they are looking for.
Naturally, everyone who runs a web site wants to be in the "top ten" of search results. This is for good reason. By and large, users will find a result they like in the top ten-and then they will head out to one of those pages. That means being listed number 11 could be as bad as not appearing at all.
Life is made more complicated by the fact that what the search engines consider to be relevant may not be what a human being considers relevant. Sites that many would feel should appear in the top ten may appear much lower in the listings. Even more maddening is that not all search engines work the same. Finally, human beings can try to "trick" search engines into thinking pages are more relevant than they really are.
Because Maximized Online (the company I'm general manager for) creates and manages a variety of web sites for clients, I wanted to better understand how search engines operate and react to the content of web pages. I wondered how deep different search engines go into a site, whether the use of meta tags help pages, and variety of other questions.
Each of the search engines have help pages that shed some light on how they determine document relevancy, but none of them fully answered my questions. A study was needed to gain answers. Over the past four months, I have made changes to pages in the Maximized Online's InfoPages.com web site and kept track of how the different engines have reacted.
The results provide insight and a common ground for anyone interested in how search engines work. I've also created The Webmaster's Guide to Search Engines and Directories, which provides at-a-glance information about search engine operation. I hope you find the study helpful.
By "default settings," I mean that I went to each search engine and simply typed in the words Orange County without using any advanced functions, operators or changing any of the options that may have been present. I wanted to mimic what I believe the majority of people might do when they visit a site, and that means sticking to lowest-common denominator settings and thinking.
InfoPages was nowhere to be found. Could I do anything to correct this? For clues, I examined the results from the various search engines side-by-side. I also examined the content of top-ranking pages. From this, I made the following conclusions:
Next it was time to test my conclusions. Over the coming months, I made small changes to pages within the InfoPages site. Here are the results (see also more detailed results):
Similar changes were made to the InfoPages listings page, in order to see how this "inside" page performed vs. the main home page. Then the search engines were prompted to return to the site in mid-January and recatalog the pages.
Results
By the end of Feburary, immediate improvements were seen on WebCrawler. InfoPages first made the top listings, then later dropped second page of results shown (Webcrawler displays 25 pages at a time). Even better success occurred in Lycos, where InfoPages eventually became the number 3 listing. Elsewhere, the small changes didn't seem to be enough to get InfoPages anywhere near the top listings.
While many dislike spamming, it does occur, so its effectiveness needed to be tested. There were hints that it might work: a top ranked page from Texas was successful because it listed information for every county in Texas, an example of inadvertent spamming. All those repetitions of "County," with one "Orange" thrown in, pushed the page to the top of WebCrawler's listings. Similarly, a travel guide uses repetition of the words Orange County and California apparently as a means to make it more likely to appear in listings. It was successful with several engines.
To test spamming, the InfoPages home page was left untouched, but the listings pages had "orange county orange county orange county orange county orange county orange county orange county orange county orange county" added to hidden comments on the page in mid-February. That's nine repetitions, for those who don't want to count, 11 references in all on the page.
Results
The InfoPages listing page quickly moved to the top of the Excite listings, testimony to how quickly Excite keeps up with the latest changes on the web, yet a sad indication that the engine can easily be tricked. On other search engines, the changes appear to have had little effect. The jury is still out with several search engines, because despite the changes having been made over a month ago, some catalogs have yet to be updated.
Results
Adding meta tags made no difference in causing the InfoPages home page to appear higher.
Catalog Updates: Some search engines take a month or longer before they update their catalogs. They may be crawling each night, but those new findings aren't available to the public until the catalogs are updated. For example, in until mid-April, WebCrawler's catalog only listed finds through Feb. 1996. The same seemed true for AltaVista, also. On the other hand, Excite's catalog consistently reflected changes soon after they were made, an indication that the catalog is constantly updated. The comparison chart shows my best estimates of how often search engines update their catalogs.