Hacettepe University Department of Library Science

 

DOK 324 Principles of Information Retrieval (Spring 2001) Yaşar Tonta

 

HOMEWORK: WEB SEARCH ENGINES

(Due Date: 25 May 2001 09:15)

 

Web search engines (Yahoo!, AltaVista, Excite, Lycos, HotBot, etc.) may permit several different kinds of searches, from a general search for documents with words in a given list, to searches using a Boolean expression, to searches constrained within some hierarchy of documents.  For each of the following queries, investigate the response of six different Web search engines. These search engines are: AltaVista, Excite, Hotbot, Infoseek, Northern Light, and Google. For each query performed on each search engine, you are to calculate the precision ratio (for example, if you obtained 100 documents from AltaVista for the first query, reviewed first 10 of them and found 5 of them relevant, the precision ratio for that search on AltaVista will be 50%) and show your relevance judgments on the printout by marking each.  Then, note all the unique relevant documents retrieved for each search query by all search engines and count them.  Based on all the relevant documents retrieved by all search engines, calculate the recall ratio for each query performed on each search engine.  Note the duplicates and broken links.  You are to repeat this for the same query on other four search engines, too.  Also note the following information:

 

·         name of the search engine,

·         the type of search done (simple, advanced, Boolean),

·         any special features used (i.e., truncation),

·         the number of documents or identified document surrogates (i.e., abstracts or summaries),

·         the number of documents or document surrogates you examined (you should at least examine 10 surrogates for each search),

·         the number of relevant documents you found within the first 10 documents (and thus precision),

·         the number relevant documents that each search engine found among all relevant documents retrieved by all six search engines,

·         your performance evaluation of each search engine along with your impression of the search engine (user satisfaction).

 

Do you think that other (better) documents were not found?  Should the search have been done without using the Web, and why?  Attach the printed copies of the searches that you performed along with the relevance judgments and precision/recall ratios for each query and search engine.

 

Here is a step-by-step explanation of what you are required to do:

 

  1. Perform the first query on AltaVista (use your skills to determine which keywords would be most "retrieval-worthy");
  2. Look at the first 10 results and determine which ones are relevant (i.e., on the same subject as your query) by marking on the printout; if needed, click on the URL address given to determine if the site is relevant to your query;
  3. Print or download the first 10 results;
  4. Calculate the precision ratio for the first query (the denominator should be 10, not the number of all documents retrieved by the search engine);
  5. Repeat the first four steps for the rest of the queries on Alta Vista;
  6. Repeat the first five steps for AltaVista, Excite, Hotbot, Infoseek, Northern Light, and Google, respectively;
  7. Now, identify the total number of all the unique sites retrieved for each query by all the search engines and mark them on the printouts (to do this, you need to compare all relevant sites retrieved by six search engines and remove the duplicate ones; this will be the denominator to use in calculating the recall ratio);
  8. Calculate the recall ratio for each query for each search engine (for example, suppose that for the first query all six search engines identified a total of 15 unique relevant sites.  Further suppose that AltaVista retrieved 3 of them.  Then the recall ratio for the first query performed on AltaVista will be 20% (3/15));
  9. Calculate the average precision ratio for each search engine for all queries;
  10. Calculate the average recall ratio for each search engine for all queries;
  11. Based on the average precision and recall ratios, evaluate the performance of each search engine;
  12. Turn in the printouts containing the first 10 results for each query for each search engine along with relevant markings and the overall performance evaluation.

 

 

Queries

 

1.       I am looking for information on the British musical band called "Divine Comedy".  I am not interested in the famous book with the same title.

2.       My professor asked me to find sites and documents on the Internet that have information on performance evaluation of search engines.  Can you help?

3.       I am writing a paper on the "Internet and ethics". Relevant papers, sites, documents, etc. are most welcome.

 

Have a good hunting!