Consumer Technology logo

Personal Computers

Internet Search Engines

Temperamental Angels of Serendipity

By Carter B. Horsley

If one read The Wall Street Journal and The New York Times and listened regularly to CNBC, one would get the impression that Internet companies like Yahoo! and others that are in the "search engine" and/or "portal" business are pretty sensational.

If, on the other hand, you actually surf the net, you would have a much different opinion. Clearly, investment bankers and analysts are not surfers for their pronouncements and reports conflict dramatically with reality. We are not talking here about euphoric forecasts for future growth and revenue for Internet companies in which there is general long-term confidence. We are talking about basics, something that many financial investors have a tendency to not focus on regularly.

A little history, first.

America On-Line (AOL) created a sensational company that originally began as a cyberspace "community" that personal computer users could access via their modems and that provided a huge collection of information neatly categorized as well as a nicely designed e-mail system. Furthermore, users could communicate with other members of the AOL "family" through "chat rooms," "buddy lists" and "instant messages." AOL essentially put a new and attractive graphic interface on traditional "ftp" (file transfer protocals) systems and threw in an efficient electronic communications system.

All of this utilized the Internet and when the World Wide Web was formed in the mid- and late-90’s, AOL quickly added to its arsenal of amenities the function of Internet Service Provider (ISP), flooding the markets with its CD’s and supporting it with a major marketing program. It grew very rapidly, sometimes with big stumbles, and now provides Internet Service to more than 18 million Americans, in the process gobbling up Compuserve, one of its rivals, as well as making scores of important strategic alliances with content providers.

For many AOL initiates, the chat rooms, e-mail and instant messages are a treasure trove of obsessive curiosity and pleasure. After many months, however, their novelty wanes a bit and most AOL members use the service primary to get onto the Internet and for research and e-mail.

In the last two years, electronic commerce has come of age, thanks in large part to the very well-organized amazon.com company and over the last year Wall Street went wild over the prospects and widespread influence of e-commerce.

AOL’s organizational skills did not go unnoticed as the constant flood of new websites became overwhelming and search engines grew in importance, greatly.

Yahoo! was the first major search engine to emerge and it seemed a godsend with its well organized subject categories. It was soon followed by Lycos, Excite, AltaVista, Infoseek and HotBot, later by Northern Light, Metacrawler, Google and Looksmart.

Initially, one would enter a "keyword" or phrase and quickly get back information that the search resulted in, say, 121,199 "results," and the surfer would then begin scrolling through the results, which were usually described in two or three lines. The search results would be returned a page at a time with 10, or 30, or 100 results on each page depending on what choices the search engine offered.

Surfers soon noticed, however, that not all search engines were alike in their results because each one had different formulas or "algorithms" for scanning the Internet to retrieve pertinent data related to the requested search. Some sorted the results better with good sites showing up near the top, while others returned far less or more than others, indicating that the nets they cast were not catching everything and often picking up more than they should.

As a result, experienced surfers would usually conduct their searches on several different search engines and for a while some would be more preferred than others. Yahoo managed to stay on top for a long while because it automatically continued the search on Alta-Vista which returned many more results. HotBot took over leadership for a while and then Infoseek and then Looksmart, although all of these had a much smaller share of the traffic than Yahoo!

In 1998, however, almost all of the search engines did two things that drastically changed their operations and effectiveness as search engines. These changes were influenced by the rapid expansion of websites and by the e-commerce hysteria that began to sweep the nation.

The changes involved numerous make-overs of layouts and, much more importantly, a limitation on the number of results returned with most search engines now limiting them to 200 even if there were 2 million available for a particular search. In addition, most of the search engines began to try to emulate AOL by creating "portal" communities to capture e-commerce for themselves and create more potential sources of revenue.

While limiting search results to 200 may not at first glace seem too severe, in practice it has rendered many searches very incomplete because most often the returned results are irrelevant, wrong, repetitive and not useful, a reflection of the inadequacy of the search engines’ sorting programs. At the same time and much less obviously, many of the search engines began selling positioning on the search results, seeking revenues from advertisers (including websites) who want to appear on the result lists when certain keywords are entered. What was insidious about this policy was that it shoved other results off the already drastically shortened returns, further rendering them inacurrate and incomplete.

The "portal" notion was an attempt to recreate the success of AOL by incorporating many non-search related features into the sites such as e-mail, chat rooms, message boards and shopping malls, all aimed at keeping the surfer on the site longer and possibly generating considerable revenue.

There is nothing wrong with competition, of course, but the search engines, and Yahoo! in particular, clearly began to focus all of their attention on becoming "portals" and gaining revenues to the virtual abandonment of their raison d’etre, being search engines.

A recent search for The City Review, for example, brought up seven major mentions on Yahoo as compared to more than two hundred on Excite, Infoseek, Alta Vista, Google and Looksmart, about 50 on HotBot and about 30 on some of the others.

The City Review, however, has submitted each of its major pages to Yahoo! on several occasions and yet they do not appear while searches on their subject matters usually returns a lot of drivel.

The seven pages that are referenced in Yahoo! do, however, generate a lot of traffic, a reflection that Yahoo! remains dominant as far as brand-name.

Recently, The City Review resubmitted its "metmus.html" page that is a very long article with many photographs about the Metropolitan Museum of Art and about a week later received an e-mail back from "adam@yahoo-inc.com" that was rather puzzling. It said that the submitted URL (webpage address on the Internet) was "one part of a larger site that already exists in Yahoo!"

"After reviewing both the submitted URL and the existing URl that encompasses it we've determined that the existing listing is adequate for users to find your site. In general, rather than separately listing every subpage of a large site, we try to find the core or hub page of the site and point users to that. Our users appreciate this because it means they do not have to sift through multiple listings from the same site, and it gives your site more prominence as it won't get lost among many separate listings of other sites," it continued.

Such logic is garbage and in fact bears little relation to the existing listings on Yahoo! It is contemptuous, inaccurate, misleading and unprofessional. Yahoo! does list some specific webpages at The City Review on the expansion plans of the Museum of Modern Art, Ellis Island and a Richard Diebenkorn exhibition, but does not list major articles on Jackson Pollock and Mark Rothko and scores of long essays on major individual landmarks of New York, just to mention a few subjects.

The problem is not merely frustrating. It is also subversive to the interests of the Internet. Search engines do not merely provide an important service. They provide a vital and critically needed service and it is becoming evident that the economics of the Internet now need a radical rethinking. It is hard to believe that the letter from Yahoo! was not a form letter rather than a note specifically written by an individual in response to the specific submission.

Yahoo! is entitled, of course, to do whatever it wants, but its credibility is severely tested by such arrogant gibberish.

One June 26, 2000, however, Google, a relatively new and superb search engine, announced that Yahoo had entered an agreement with it for it to become the main "web pages" search engine for Yahoo, a very significant development. The agreement was scheduled to be implemented within 30 days and actually began to be implemented in about two weeks. The change is a very dramatic improvement for Yahoo, which is good news for everyone. Yahoo still retains its basic structure and its own "web categories" and "web sites" that it returns first on searches, but if a user clicks on "web pages at the bottom of the first page of results, the Google search engine returns results. (For The City Review, for example, Yahoo only showed five category results, but the Google-driven web page results now totaled more than 450!)(7/8/00)

For many serious surfers, the alternative is to find websites that are devoted narrowly to a specific subject and which provide extensive links to relevant sites. They are a great many of these, mostly non-profit and mostly non-computer generated, that is, these links lists are compiled by individual webmeisters based on their own surfing and on submissions from other sites. More than 250 websites, for example, have links to either the home page or another specific page at The City Review and easily half of those links were put up on the Internet without imput from The City Review and simply because the webmeisters of those sites found them of interest on their own.

This approach has been used by Miningco.com, which recently changed its name to About.com. This operation has more than 600 individual "guides" who oversee and organize thousands of categories of recommended sites. In principle, this seems good and many of the recommended sites are rewarding, but in practice the lists are inconsistent in their quality and comprehensiveness. The City Review does not show up in their search engine for their own site although it is included in a few of the subcategories and should be included in a couple of hundred.

The enormity of the constantly expanding Web is understandably bound to create severe backlog problems, but there does not appear to be much reason for optimism as some of the search engines appear to be arbitrarily deleting many existing links from their databases. A July 8, 1999 article by Lisa Guernsey in The New York Times noted that a new study reported in the journal Nature (and available at http://www.wwwmetrics.com) indicated that "of 11 search engines studied, not one indexes more than 16 percent of the Web" and "most cover less." (7/12/99)

As private concerns, of course, the search engines need funding to operate and expand. It is probably un-American and un-Internetish to suggest that the search engines be operated by a public entity and there is not a need for public regulation. There is a need, however, for good business practices and ethics. It is not nice, to put it mildly, for big bucks to censor other sites by crowding them off the search engines and it violates the spirit of the Internet. The search engines, of course, have many problems, not the least of which are unscrupulous "spammers" who try to fineagle and manipulate their lists.

(There is another major problem that is not directly related to the search engines and involves some of the proliferating "hosting" communities such as GeoCities and Tripod, companies that provide Internet service and free home pages to users in return for being able to target them with advertising. Some Internet awards sites have begun to specifically exclude these sites because once visited surfers are often captured and not able easily to back out of them. These sites have grown dramatically and are capitalizing on the fact that once a user has signed up with them and created their own pages they are unlikely or at least very reluctant to opt out. Such surfing "captures," however, are infuriating and only serve to undermine the viability of the Internet.)

The privatization pitfalls of the Internet are somewhat understandable because it is still an emerging phenomenon, albeit a truly major and significant one. There are not easy solutions, especially since marketing methods has long placed a high value on target lists and junk mail is as American as apple pie.

The promise of the free flow of information and its widespread dissemination has been the promise of the Internet, a promise that has astounded and enraptured most surfers. To the dismay of many advertisers, surfers generally do not like to be waylaid by advertising on their way to desired information. The advertising industry is still trying to develop workable business models that combine its desire for "click-through" information with minimal space for banners and boxes and reasonable rate structures. Many advertisers understand that brand recognition and sponsorship prestige is important, but many others are very result-oriented, understandably. While they are eager to use Internet technology to better target consumers, surfers on the other hand resist privacy invasions and while bandwith problems remain also are annoyed at having to wait longer for sites to "load."

The Internet will survive these problems, thank goodness, but they are not minor.

Conceiveably, the search engines might adopt a subscription model for serious surfers that, for a modest annual fee, would enable them to search through all "results" rather than just 200. Hopefully, of course, they will continue to try to improve their sorting algorithms and work on reducing their backlogs. Presumably, they could introduce the subscription model by offering it free for a month at its introduction so that people could see its benefits and hopefully the fee will not be too expensive for mere academics and the curious to afford.

Surfers, of course, have a "free" mentality, but as more and more of them experience the current failings of the search engines it is probably that a subscription model would work if surfers can be reasonably confident that the searches are very extensive and as comprehensive as possible within realistic expectations, assumptions that at the moment are very big "ifs."

The Internet has wrought, and is wringing, many major changes in the workplace and the personal lives of its users. Its speed, even at 28 K, is mindboggling even as its breadth is intimidating and Sisyphusean. Like all juggernauts, of course, it is experiencing growing pains and the changes of its growth do not follow conventional financial forecasts.

The Internet is really about serendipity and search engines are its angels, wafting marvels all around us, but usually with a bit of impish gamesmanship in the form of pitfalls, dross and irrelevancies that are there to make the jewels shine brighter.

For the surfer who first stumbles across the Internet Movie Data Base (www.imdb.com) or Amazon.com (www.amazon.com) there are few greater thrills as such stupendous sites unleash a sense of empowerment and enrichment that generates hope for much better worlds, etc., at our real/virtual fingertips.

A great site with excellent information on search engines is at http://www.searchenginewatch.com.

 

Home Page of The City Review