This report critically evaluates and compares four popular World Wide Web search engines that focus exclusively on news content. After a cursory analysis of the policies, scope, and interface of each engine, the results of the search engines are compared. Results are analyzed for subject relevance to the query, their date of publication, publication source, and country of publication.
News search engines are commonly used by Web users to find news on specific subjects from a variety of sources. This experiment was designed to evaluate how four popular news search engines compared in both the relevance of results returned and the nature of the sources of those results. In particular, the publication, date of the publication, whether the article was a reprint from a news bureau or original reporting, and the country of publication were noted.
After briefly examining the search engines listed in two popular directories, Yahoo!  and DMOZ , as well as those awarded by Search Engine Watch , I settled on evaluating the four search engines that seemed to be superior to the others. My selections coincided with the Search Engine Watch awards, indicating that these are likely the best of the current news search engines.
The search engines selected were Altavista News [ http://www.altavista.com/news/ ], Daypop [ http://www.daypop.com ], Google News [ http://news.google.com ], and Yahoo! News [ http://news.yahoo.com ]. (AllTheWeb News uses the same Overture technology as Altavista News, according to Search Engine Watch , the choice to use Altavista instead of it was based on Altavista having the better known and older brand name.)
According to SearchEngineShowdown, Altavista indexes over 3000 news sources and updates hourly.  Altavista News presents a search box and directly beneath it the ability to tailor the search by selecting from four drop-boxes that specify topic, region, source, and date range. The region drop-box has continental regions listed, then a divider, then countries listed. The default is all-regions. The sources drop-box has eleven mainstream publications including The New York Times, the BBC, and CNN. The source list is clearly not exhaustive, but lists some common sources that are likely popular destinations. It defaults to all sources. The Date drop-box allows you to search only the past 48 hours, week, two weeks, 30 days, or specify a date range. It defaults to "Last 7 days."
Daypop is the only "independent" search engine here not produced by a large company, and is run by Dan Chan. As such, it has much more transparency than other search engines, and you can even read about the daily issues on his weblog.  Daypop purports to index 17000 current events related sources that include news sites, weblogs, and RSS feeds.  Major news sources are indexed more frequently - every 3 hours - than "lesser news sites" which are only indexed once a day. You can choose to search news & weblogs, only news, only weblogs, or only RSS feeds through Daypop. Although Daypop indexes and uses link rich weblogs, it does not use authority metric to rank results. The Daypop technology page states that this is to prevent popularity outweighing relevance in results, and that since the source list is human edited, poor sources generally are not a problem.
Google claims to index 4500 sources "continuously." SearchEngineShowdown lists the frequency as 5 minutes.  It clusters its results. There is no definitive list available of these sources available. There has been some controversy with some of the sources included in the index, such as press releases, as discussed by Declan, 2003. 
Yahoo is a little different than the other search engines listed in that as a portal, it aggregates news content from many providers and has a search capability on only those providers. These sources are listed  on Yahoo itself. The number of sources is in the dozens, nut hundreds or thousands like the other engines. Unlike the other search engines that by default list results by "relevance" and have an option to sort by date, Yahoo by default sorts by "date and relevance," and has options to choose either individually. For this experiment, the default of "date and relevance" was used, although brief testing indicated no substantive difference between "date and relevance" and just "relevance."
Three search queries were devised. The subjects of the search query were chosen by examining the top stories on the previous night's "Newhour With Jim Lehrer" on PBS. The topics were chosen from there to ensure that they were recognized by a noncommercial entity as being some of the most newsworthy stories of the day. It also ensured that the topics were of national or international importance. While this does limit the scope of this experiment, news of only local importance presents a number of problems. First, the local news sources that covered it must be indexed by the news search engines, and second, depending on the location there may be very few sources. Even in a major metropolitan area, it is probably more effective to go to a known local news provider to ensure coverage. Also, since one of the things this experiment is attempting to look at the range of sources in the results, local news stories will likely not exhibit much, if any, range.
Searches were made with default settings to emulate a normal user rather than an experienced power user. The first ten results were analyzed. This was also to emulate a normal user. Altavista, Daypop and Yahoo default to showing ten results on a page, which also makes this a reasonable cutoff. Google shows twenty of what it considers results, but each one of their results can be a cluster of pages. For the purposes of this experiment, a results is any news page listed in the results. Thus, if a Google news search returned a "result" that had five pages clustered together, then another "result" with three clustered pages, each of those eight pages was analyzed as a distinct result for the purposes of this experiment. Daypop also does some limited source clustering, and those are also considered individual results, but they did not turn up in these particular experimental queries.Results were first analyzed for relevance on a three point scale. Two points were awarded for direct subject relevance and pertinence to the query. That is, the result actually contained the information the search query specifies. One point was awarded for general subject relevance, but not pertinence. Results that only mentioned the search query but were not focused on the specific query, for example. Zero points were awarded if there was no general subject relevance, or the page was not accessible.
The date of publication was recorded, as well as what publication the results was from. The publication was rated on a three-point scale. Two points for an exemplary, well known, reliable news source, such as The New York Times, Washington Post, CNN. One point was for a seemingly credible news source. Zero points were noted for a source that had some problem, either it was not clearly a news source, or was a news source with perceived severe bias. These will be discussed more in detail in the results.
The country of publication was also noted. All results were in English, which imposed certain limits on this, but was a necessity due to my lack of foreign language knowledge. The decision was not to use automated translation to attempt to rate foreign language documents. A possible area of future study would be to try a similar experiment but with all language results returned.
A more complicated attribute that was analyzed was whether the story was a reprint. If the story was from another source, such as a news wire, but published in another publication, this was considered a reprint. However, all Yahoo! News stories are "republished" in a sense on Yahoo.com. These were not considered "reprints." The attribute was attempting to measure original reporting. As such, a result that was a Reuters news story, but specifically noted as a Reuters story was not marked as a reprint. If that same Reuters story was returned as a result in, for example, "The Guardian", then it was considered a reprint.
A complete copy of the data gathered is available
The first query was done the afternoon of Wednesday, March 3, 2004 based on Tuesday's evening news. The general search query was the exile of the Haitian President. Specifically, information on his location in exile, and the circumstances of his exile. The search terms chosen were "Haiti president exile" - without quotes, done with all default settings, except Daypop where the language was restricted to English.
Altavista was the lowest performer in relevance in this query, averaging 1.1 relevance. Of particular disappointment was that the first result was from the "Chippewa Falls Herald." In fact nine out of ten of the results were reprints. The only one that was not was from the "Abilene Christian University Optimist," a result that had little to no relevance to the query, and was about the school's trip to Haiti being cancelled as a result of the events. All college newspapers were noted as possibly problematic sources. While the actual qualitative difference between the mainstream media and college newspapers is beyond the scope of this paper, it is probably the case that most users of these search engines do not expect to see results from such outlets in their results. Altavista did perform decently in the freshness category, results on average only slightly more than a day old.
Daypop tied with Google to have the highest average relevance in this query, although it also had the least fresh articles on average. Of note was that the third result was from the "World Socialist Web," a source marked as problematic and biased. On the average, however, Daypop had enough highly credible sources to offset that and give it the highest average source score.
Google's results were notable for having the largest proportion of non-United States sources - seven out of ten. Although there was a wide range of countries represented, some of the sources were problematic, the most striking example being the second result from Al Jazeera. This was even more surprising since Google usually customizes its offerings on a per country basis. I hesitate to make a judgment about the worth of Al Jazeera as a source, but think that it is clearly controversial enough to be noted. Google's results were also notable for being the freshest - on average barely a day old.
Yahoo's results, although not marked as reprints, were composed primarily of AFP and AP wire reports, having only a slightly lower relevance than Google and Daypop. The fact that the first two results were three days old made the Yahoo results seem worse at first glance, but overall was probably on par relevance wise.
This query was also done on Wednesday, March 3, 2004. The general query was to find the results of the Democratic national primaries held on Tuesday, commonly called "Super Tuesday." The search terms chosen were "Super Tuesday results" - without quotes. Relevance was easier to measure in this case - actually specifying who won the Super Tuesday primaries scored a 2, mentioning the terms but not having the actual results scored a 1, and no subject relevance at all was marked with a 0.
Altavista had a much higher relevance score than the other search engines, with nearly all results in the top ten having the required information. However, the top two results were basically the same - published in the same source within 45 minutes of each other. My methodology did not assume this would happen, and so it was not penalized, but perhaps it should - a metric for uniqueness might be helpful in evaluation.
Daypop and Google again had nearly identical relevance scores, but Daypop clearly had the better results overall. Almost all of the Daypop sources were considered highly credible, while Google was filled with questionable sources, including Fox News and college publications. Two of the results from "Penn Live" were noted as unavailable since they required filling out a form with age and location information before access. The Google results overall were of low quality, and three of ten were unavailable from a direct click from the results page.
Yahoo's results had a lower relevancy score in part because of the age of the results - its impossible to return the Super Tuesday primary results on Monday. However, the first two results were directly pertinent, and in a specific search like this, that's probably sufficient for most users.
The third query was to find information about who from Worldcom was recently charged, and what he was charged with. Two points were scored for having both pieces of information, one point for general relevance to Worldcom but lacking the pertinent information, and zero points for no relevance. The search terms used were "Worldcom criminal charges."
This time Altavista and Google had the highest relevance scores, 1.7. Google again, however, had issues with their choice of sources, with two results from the questionable "BizReport" and one from the decidedly non-news organization "AccountingWeb."
Daypop had some real problems with this query, and scored the lowest relevance. Daypop's first result was the RSS newsfeed of the "Weekly Standard" - likely some kind of bug since Daypop also indexes weblogs and RSS feeds in addition to news sources. Daypop also returned a "Forbes NewsScan" page, which apparently has little blurbs about recent stories. While it may have had relevant content at the time it was indexed, it did not when I visited it. Interestingly, this was one of the few instances where the age of a page actually made it completely irrelevant.
Altavista and Google had the top overall relevance score, with Altavista just slightly higher. Daypop and Yahoo were third and fourth, respectively. Yahoo's overall poor results in the Super Tuesday query are probably responsible for this.
In terms of page age, Google and Altavista both averaged pages less than a day old, Google with a slightly better average. Daypop and Yahoo both averaged slightly more than a day old, with Yahoo's results slightly fresher.
Reputation of sources, which was a difficult metric to evaluate, yielded very interesting results. Google's results by far yielded the most questionable sources, and its reputation score under one reflects this. On the other hand, Google also exhibited the largest range of geographical origin of sources in its results. Daypop yielded the best ratio of high quality sources, although, again, this was probably one of the more subjective metrics.
Due to the relatively small sample size, only thirty pages from each engine, the conclusions drawn from these results should be viewed with caution. Additionally, many human interface factors influence the overall utility and worth of a search engine from a user's perspective. (Although outside the scope of this paper, Altavista seemed to immediately exhibit some shockingly bad interface tendencies.)
It would be difficult from these results to make any judgment that one search engine was clearly superior to the others tested. However, a few features of each of the search engine are important to note. The selection of sources of each greatly influences the results, and the frequency with which those sources are reindexed. While Google seems to index the most sources, many of those sources are questionable as reliable news sources. (Although not present in these particular results, Google also indexes some weblogs, press releases, and even some satirical news sources.)
However, if a user is looking for the broadest possible range of sources, Google seems to be superior, especially for foreign news sources. If one is concerned about the quality of sources, the more restricted set from Yahoo would seem to be best, although from these particular queries Daypop actually performed slightly better in this category.
The quality of sources does not matter much if the results are irrelevant or too old to be relevant. On these grounds, Altavista and Google both seemed to perform better than Yahoo and Daypop when using overall average relevancy. This does not seem to coincide with my subjective view of what returned the best or most useful results. In part because the relevancy number alone fails to take into account the placement of the result (those in the top three tend to color my view of the results more than the rest) as well as failing to take into account the uniqueness of the results. That is, whether the result was a reprint of a newsweed article in a small-time newspaper, as well as whether or not it was an alternate version of a story listed earlier in the results. Although whether the article is a reprint, and its country of origin were noted in the data collection, it is hard to draw any clear conclusions from them. Other than some of Google's results, almost all of the results returned were from US sources.
Also important to note is that there was pertinent information to the query somewhere within the top ten results from each engine in each query, so on the most basic principle each of the search engines did succeed to some extent. Certainly none of them performed so poorly as to make them unusable, and in fact there may be many instances where the best results could be gleaned by using multiple news search engines.