THE ANALYSIS OF SEARCH FAILURES IN CHESHIRE

by

Yasar Tonta

 

 

 

 

A research proposal

(Draft 1.0)

 

 

 

 

12 March 1991

Berkeley

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

CONTENTS

1. Introduction, p.1

2. Overview of a Document Retrieval System, p.5

3. Relevance Feedback Concepts, p.8

4. Failure Analysis in Document Retrieval Systems, p.10

5. The Present Study, p.14

5.1 Objectives of the Study, p.15

5.2 Hypotheses, p.16

6. The Experiment, p.17

6.1 CHESHIRE Online Catalog System, p.18

6.1.1 Document Retrieval Process in CHESHIRE, p.18

6.1.2 Relevance Feedback Process in CHESHIRE, p.21

6.1.3 Relevance Feedback Formula Used in Experiment p.21 6.1.4 Relevance Judgments, p.24

6.1.5 Retrieval Effectiveness Measures, p.25

6.2 Subjects, p.27

7. Data Gathering, Analysis, and Evaluation Methodology, p.28

8. Expected Results, p.34

9. Select Bibliography, p.35

 

 

1. Introduction

A perfect document retrieval system should retrieve all and only relevant documents. Yet online document retrieval systems are not perfect. They retrieve some non-relevant documents while missing, at the same time, some relevant ones.

The observation just mentioned, which is based on the results of several information retrieval experiments, also well summarizes the two general types of search failures that frequently occur in online document retrieval systems: (1) non-relevant documents retrieved, and (2) relevant documents not retrieved. The former type of failure is known as precision failure: i.e., the system fails to retrieve relevant documents only. The latter is known as recall failure: i.e., the system fails to retrieve all relevant documents.

The two concepts, precision and recall, come from the information retrieval experiments and are the most widely used measures in evaluating retrieval effectiveness in online document retrieval systems. Precision is defined as the proportion of retrieved documents which are relevant, whereas recall is defined as the proportion of relevant documents retrieved (Van Rijsbergen, 1979).

In order for a document retrieval system to retrieve some documents from the database two conditions must be satisfied. First, documents must be assigned appropriate index terms by indexers. Second, users must correctly guess what the assigned index terms are and enter their search queries accordingly (Maron, 1984). These two conditions also explain the main causes of search failures in document retrieval systems; namely, the problems encountered during indexing and query formulation processes.

Recall failures occur mainly because certain index terms do not get assigned to certain documents even though some users would look under those index terms in order to retrieve the kinds of documents they want. If this is the case, then the system will fail to retrieve all the relevant documents in the database. Recall failures are difficult to detect especially in large scale document retrieval systems.

Precision failures, on the other hand, are more complicated than recall failures, although they are much simpler to detect. Precision failures occur when the user finds some retrieved documents non-relevant even if documents are assigned those index terms which the user initially asked for in his/her search query. In other words, users do not necessarily agree with indexers as to the relevance of certain documents just because indexers happened to have assigned the terms selected by users. Relevance is defined as a "relationship between a document and a person in search of information" and it is a function of a large number of variables concerning both the document (i.e., what it is about, its currency, language, date) and the person (i.e., person's education, beliefs, etc.) (Robertson, Maron and Cooper, 1982).

Other factors such as ineffective user-system interfaces, index languages used, and retrieval rules can also cause search failures in document retrieval systems. In a landmark study, Lancaster (1968) provided a detailed account of search failures that occurred in MEDLARS (Medical Literature Analysis and Retrieval System) along with the status of MEDLARS' retrieval effectiveness. More recently, Blair and Maron (1985), having conducted a retrieval effectiveness test on a full text document retrieval system, explicated the probable causes of recall failures attained in their study.

Although the causes of precision and recall failures can be explained relatively straightforwardly, the detailed intellectual analysis of reasons why these two kinds of failures occur in document retrieval systems is rarely conducted nonetheless.

Users' knowledge of (or lack thereof) controlled vocabularies and query languages also causes a great deal of search failures and frustration. Most users are not aware of the role of controlled vocabularies in document retrieval systems. They do not seem to understand (why should they?) the structure of rigid indexing and query languages. Consequently, users' natural language-based search query terms often fail to match the titles and subject headings of the documents, thereby causing some search failures. The "brittle" query languages based on Boolean logic tend to exacerbate this situation further, especially for complicated search queries requiring use of Boolean operators.

Notwithstanding the circumstantial evidence gathered through various online catalog studies in the past, studies examining the match between users' vocabulary and that of online document retrieval systems are scarce. Moreover, the probable effects of such a mismatch on search failures are yet to be fully explored.

Natural language query interfaces are believed to improve search success in document retrieval systems as users are able to formulate search queries in their own terms. Search terms chosen from a natural language will more likely to match the titles of the documents in the database. Nevertheless, the role of natural language-based query interfaces in reducing search failures in document retrieval systems needs to be thoroughly studied.

There appears to be some kind of relationship between users' perception of retrieval effectiveness and that which is obtained

through precision and recall measures. That is to say that users might think that they retrieved most of the relevant documents even though document retrieval systems tend to miss a great deal of relevant ones (i.e., recall failures). For instance, Blair and Maron (1985) observed that users involved in their retrieval effectiveness study believed that "they were retrieving 75 percent of the relevant documents when, in fact, they were only retrieving 20 percent" (p.295). As mentioned above, users might also disagree with indexers as to the relevance of a document with user-selected index terms assigned thereto (i.e., precision failures). It seems, then, that the relationship between "user designated" ineffective searches and ineffective searches thus identified by the retrieval effectiveness measures deserves further investigation.

2. Overview of a Document Retrieval System

What follows is a brief overview of a document retrieval system and its major components.

The principal function of a document retrieval system is to retrieve, for each search request, all and only relevant documents from a store of documents. In other words, the system should be capable of retrieving all relevant documents while rejecting all the others.

Maron (1984) provides a more detailed description of the document retrieval problem and depicts the logical organization of a document retrieval system as in Fig. 1.

Incoming Inquiring

documents patron

 

 

Document Thesaurus Query

identification Dictionary formulation

(Indexing)

 

 

Index Retrieval Formal

records rule query

 

 

 

Fig. 1. Logical Organization of a Conventional Document

Retrieval System

Source: Maron (1984), p.155.

As Fig. 1 suggests, the basic characteristics of each incoming document (e.g., author, title, subject) are identified during the indexing process. Indexers may consult thesauri or dictionaries (controlled vocabularies) in order to assign acceptable index terms for each document. Consequently, an index record is constructed for each document for subsequent retrieval purposes.

Likewise, users can identify their information needs by consulting the same index tools during the query formulation process. That is to say that a user can check to see if the terms he/she intends to use in his/her formal query are also recognized by the document retrieval system. Ultimately, he/she comes up with the most promising query terms (from the retrieval point of view) that he/she can submit to the system as his/her formal query.

As mentioned before, most users do not know about the tools that they can utilize to express their information needs, which results in search failures in view of a possible mismatch between users' vocabulary and the system's vocabulary. As Maron (1984) points out, "the process of query-formulation is a very complex process, because it requires that the searcher predict (i.e., guess) which properties a relevant document might have" (p.155). Finally, "the actual search and retrieval takes place by matching the index records with the formal search query. The matching follows a rule, called "Retrieval Rule," which can be described as follows: For any given formal query, retrieve all and only those index records which are in the subset of records that is specified by that search query" (Maron, 1984, p.155).

It follows, then, that a document retrieval system consists of (a) a store of documents (or, representations thereof); (b) a population of users each of whom makes use of the system to satisfy their information needs; and (c) a retrieval rule which compares representation of each user's query with the representations of all the documents in the store so as to identify the relevant documents in the store.

In addition, there should be some kind of user interface which allows users to interact with the system. A user interface mechanism has several functions: (1) to accept users' query formulations (in natural language or otherwise); (2) to transmit queries to the system for processing; (3) to bring the results back to users for evaluation; (4) to make various forms of feedback possible between the user and the document retrieval system.

The feedback function deserves further explanation. On the one hand, the system may prompt users as to what to do next or suggest alternative ways by way of system-generated feedback messages (i.e., help screens, status of search, actions to take). On the other hand, users should be able to modify or change their search queries in the light of a sample retrieval so as to improve search success in subsequent retrieval runs (Van Rijsbergen, 1979). Moreover, some systems may automatically modify the original search query after the user has made relevance judgments on the documents which were retrieved in the first try. This is known as "relevance feedback" and it is the relevance feedback process that considers us here.

3. Relevance Feedback Concepts

Swanson (1977) examined the well-known information retrieval experiments and the measures used therein. He suggested that the design of document retrieval systems "should facilitate the trial-and-error process itself, as a means of enhancing the correctability of the request" (p.142).

Van Rijsbergen (1979) shares the same view when he points out that: "a user confronted with an automatic retrieval system is unlikely to be able to express his information need in one go. He is more likely to want to indulge in a trial-and-error process in which he formulates his query in the light of what the system can tell him about his query" (p.105).

Van Rijsbergen (1979) also lists the kind of information that could be of help to users when reformulating their queries such as the occurrence of users' search terms in the database, the number of documents likely to be retrieved by a particular query with a small sample, and alternative and related search terms that can be used for more effective search results.

"Relevance feedback" is one of the tools that facilitates the trial-and-error process by allowing the user to interactively modify his/her query based on the search results obtained during the initial run. The following quotation summarizes the relevance feedback process very well:

"It is well known that the original query formulation process is not transparent to most information system users. In particular, without detailed knowledge of the collection make-up, and of the retrieval environment, most users find it difficult to formulate information queries that are well designed for retrieval purposes. This suggests that the first retrieval operation should be conducted with a tentative, initial query formulation, and should be treated as a trial run only, designed to retrieve a few useful items from a given collection. These initially retrieved items could then be examined for relevance, and new improved query formulations could be constructed in the hope of retrieving additional useful items during subsequent search operations" (Salton and Buckley, 1990, p.288).

Relevance feedback was first introduced over 20 years ago during SMART information retrieval experiments. Earlier relevance feedback experiments were performed on small collections (e.g., 200 documents) where the retrieval performance was unusually high (Rocchio, 1971; Salton, 1971; Ide, 1971).

It was shown that the relevance feedback has markedly improved the retrieval performance. Recently Salton and Buckley (1990) examined and evaluated twelve different feedback methods "by using six document collections in various subject areas for experimental purposes." The collection sizes they used varied from 1,400 to 12,600 documents. The relevance feedback methods produced improvements in retrieval performance ranging from 47% to 160%.

The relevance feedback process offers the following main advantages:

"- It shields the user from the details of the query formulation process, and permits the construction of useful search statements without intimate knowledge of collection make-up and search environment.

- It breaks down the search operation into a sequence of small search steps, designed to approach the wanted subject area gradually.

- It provides a controlled query alteration process designed to emphasize some terms and to deemphasize the others, as required in particular search environments" (Salton and Buckley, 1990, p.288).

 

4. Failure Analysis in Document Retrieval Systems

Various studies have shown that users experience several problems when doing searches in online document retrieval systems and they often fail to retrieve relevant documents (Lancaster, 1968). The problems users frequently encounter when searching especially in online library catalogs are well documented in the literature (Bates, 1986; Borgman, 1986; Cochrane and Markey, 1983; Hildreth, 1989; Kaske, 1983; Kern-Simirenko, 1983; Larson, 1986, 1991b; Lawrence, Graham and Presley, 1984; Markey, 1984, 1986; Matthews, 1982; Matthews, Lawrence and Ferguson, 1983; Pease and Gouke, 1982). Very few researchers, however, studied the search failures directly (Lancaster, 1968; Peters, 1989).

Hildreth (1989) considers the "vocabulary" problem as the major retrieval problem in today's online catalogs and asserts that "no other issue is as central to retrieval performance and user satisfaction" (p.69). As suggested earlier, this may be due to the fact that controlled vocabularies are far more complicated than users can easily grasp in a short period of time. In fact, several researchers have found that the lack of knowledge concerning the Library of Congress Subject Headings (LCSH) is one of the most important reasons why users fail in online catalogs (see, for instance, Bates, 1986; Borgman, 1986; Gerhan, 1989; Lewis, 1987; Markey, 1986). Larson (1986) found that almost half of all subject searches on MELVYL (University of California Library System) retrieved nothing. More recently, Larson (1991b) analyzed the use of MELVYL over a longer period of time (6 years) and found that there is a significant positive correlation between the failure rate and the percentage of subject searching. This confirms the findings of an earlier formal analysis of factors contributing to success and satisfaction: "problems with subject searching were the most important deterrents to user satisfaction" (University, 1983, p.97).

Carlyle (1989) studied the matching between users' vocabulary and LCSH and found that "single LCSH headings match user expressions exactly about 47% of the time" (p.37). The study conducted by Van Pulis and Ludy (1988) showed that 53% of the user entered terms matched subject headings used in the online catalog. Findings as such suggest that some of the search failures can be attributed to controlled vocabularies in current online catalogs.

From the point of view of users it is certainly preferable to be able to express their information needs in their own natural language terms. However, most, if not all, online catalogs today cannot accommodate search requests submitted in natural language form. Yet it is believed that natural query languages may reduce search failures in online catalogs by improving the match between users' search terms and the system's controlled vocabulary. Nevertheless, the role of natural query languages in search success in online catalogs is yet to be thoroughly investigated.

Markey (1984) discusses several different data gathering methods that were used in online catalog use studies such as questionnaires, interviews, controlled experiments and transaction monitoring. Transaction monitoring was "designed to permit detailed analysis of individual user transactions and system performance characteristics. The individual transaction records provide enough information for analysts to reconstruct the events of any user session, including all searches, displays, help requests, and errors, and the system responses" (Larson, 1991b, p.7).

Cochrane and Markey (1983) point out that different data gathering methods have different strengths. For instance, questionnaires and interviews can provide insight on the user's attitude toward the online document retrieval system while transaction log analysis can reveal the actual user behavior at online catalogs (Tolle, 1983).

Despite the fact that the methods discussed above are most useful tools to gather data on online catalog use, they do not necessarily help fully explain the causes of search failures that occur in online catalogs. Transaction logs, for instance, can document search failure occurrences but cannot explain why a particular search failed. A variety of reasons may cause search failures in online catalogs: simple typographical errors, mismatch between user's search terms and the vocabulary used in the catalog, the database (i.e., requested item is not in the system), the user interface, the search and retrieval algorithms, to name but a few. In order to find out why a particular search failed, one needs further information in regards to the users' needs and intentions, which, obviously, are not recorded on transaction logs.

Data regarding user needs and intentions can be gathered through a technique known as "critical incident technique." This technique "consists of a set of procedures for collecting direct observations of human behavior in such a way as to facilitate their potential usefulness in solving practical problems" (Flanagan, 1954, p.327). The critical incident technique has been widely used in the analysis of the specific failures in learning to fly, in measuring typical performance, and in the identification of critical requirements for certain professions. The major advantage of this technique is to obtain "a record of specific behaviors from those in the best position to make the necessary observations and evaluations" (Flanagan, 1954, p.355). In other words, it is the observed behaviors that count in critical incident technique, not the opinions, hunches and estimates.

Recently, critical incident technique has been used to analyze and evaluate search failures in MEDLINE (Wilson and Starr-Schneidkraut, 1989). Users were asked to comment on the effectiveness of online searches which they performed in MEDLINE. The user designated reasons as to why a particular search failed (or succeeded) were recorded through a questionnaire used during the interviews. These "incident reports" were later matched against MEDLINE transaction log records corresponding to each search in order to find out the actual reasons for search failures (and search success). It is these incident reports that provide much sought after data concerning user needs and intentions, and put each transaction record in context by making transaction logs no longer "anonymous."

The user designated ineffective MEDLINE searches were further analyzed by Cooper and Campbell (1989).

It appears that critical incident technique can successfully be used in the analysis of search failures in online catalogs as well. Matching incident reports against transaction logs is especially promising. Since the analyst will, through incident reports, gather contextual data for each search query, more informed relevance judgments can be made during the evaluation of retrieval effectiveness process. Furthermore, this technique can also be utilized to compare user designated search effectiveness with that obtained through traditional retrieval effectiveness measures.

5. Present Study

The present study will attempt to investigate the probable causes of search failures in a "third generation" experimental online catalog system. The rigorous analysis of retrieval effectiveness and search failures will be based on transaction log records and critical incident technique. The former method allows one to study the users' search behaviors unobtrusively while the latter helps gather information about user intentions and needs for each query submitted to the system.

The findings to be obtained through this study will shed some light on the probable causes of search failures in online catalog systems. The results will help improve our understanding of the role of natural query languages and indexing in online catalogs. Furthermore, the findings might provide invaluable insight that can be incorporated in future retrieval effectiveness and relevance feedback studies.

5.1 Objectives of the Study

The purpose of the present study is to:

1. analyze the search failures in online catalogs so as to identify their probable causes and to improve the retrieval effectiveness;

2. ascertain the extent to which users' natural language-based queries match the titles of the documents and the Library of Congress Subject Headings (LCSH) attached to them;

3. compare user designated ineffective searches with the effectiveness results obtained through precision and recall measures;

4. measure the retrieval effectiveness in an experimental online catalog in terms of precision and recall;

5. identify the role of relevance feedback in improving the retrieval effectiveness in online catalogs;

6. identify the role of natural query languages in improving the match between users' vocabulary and the system's vocabulary along with their retrieval effectiveness scores in online catalogs.

5.2 Hypotheses

Main hypotheses of this study are as follows:

1. Search failures occur in online catalog systems;

2. The match between users' vocabulary and titles of, and LCSH assigned to, documents will help reduce the search failures and improve the retrieval effectiveness in online catalogs;

3. The relevance feedback process will reduce the search failures and enhance the retrieval effectiveness in online catalogs;

4. User designated ineffective searches in online catalogs do not necessarily coincide with system designated ineffective searches.

 

 

6. The Experiment

In order to test the hypotheses of this study and address the research questions raised above, an experiment will be conducted on an online catalog. The major objective of the experiment will be to gather data on the use of an experimental online catalog for a specified period of time for further analysis and evaluation. Data such as users' actual search queries submitted to the online catalog, the records retrieved and displayed to the users, users' relevance judgments for each record displayed, records retrieved and displayed after the relevance feedback process can be given as examples of the types of data to be collected during this experiment. Such data will be analyzed in order to find out the retrieval effectiveness attained in the experimental online catalog. The search failures will be documented and their causes will be investigated in detail. Further data will be collected from the users about their information needs and intentions when they performed their searches in the online catalog. As pointed out earlier, a detailed analysis will be performed to find out if there is some corroboration between user designated ineffective searches and search failures thus identified by the system.

6.1 CHESHIRE Experimental Online Catalog

The experiment will be conducted on the CHESHIRE (California Hybrid Extended SMART for Hypertext and Information Retrieval Experimentation), an experimental online catalog system "designed to accommodate information retrieval techniques that go beyond simple keyword matching and Boolean retrieval to incorporate methods derived from information retrieval research and hypertext experiments" (Larson, 1989 p.130). The test database for the CHESHIRE system consists of some 30,000 MARC records representing the holdings of the Library of the School of Library and Information Studies in the University of California at Berkeley. CHESHIRE uses a modified version of Salton's SMART system as the "retrieval engine" and index manager and it runs on a Sun workstation with 320 megabytes of disk storage. Larson (1989) provides a more detailed information about CHESHIRE and the characteristics of the collection.

The CHESHIRE system uses the "classification clustering" technique which is based on the presence of identical LC classification numbers and which brings documents with the same LC classification number together along with the most frequently used LC subject headings in a particular cluster. At present, some 8400 classification clusters have been created for the above collection. For this experiment, a version of CHESHIRE database will be mounted on (Sun) workstations in the Computer Laboratory of the School of Library and Information Studies. It can be accessed trough the network file system should the needed modifications be made on the network.

6.1.1. Document Retrieval Process in CHESHIRE

CHESHIRE accommodates queries in natural language form. The user describes his/her information need using words taken from the natural language and submits this statement to the system. First, a retrieval function within the system analyzes the query, eliminates the "buzz" words (using a stop list), processes the query using the stemming and indexing routines and weights the terms in the query to produce a vector representation of the query. Second, the system compares the query representation with each of the some 8400+ document cluster representations in order "to retrieve and rank the cluster records by their probabilistic "score" based on the term weights stored in the inverted file... The ranked clusters are then displayed to the user in the form of a textual description of the classification area (derived from the LCC [the Library of Congress Classification] summary schedule) along with several of the most frequently assigned subject headings within the cluster" (Larson, 1991a, p.17). (For the theoretical basis of, and the probabilistic retrieval models used in, CHESHIRE online catalog system, see Larson (1991c).)

Once the system finds the "would-be" relevant clusters the user then will be able to judge some of the clusters as being relevant by simply identifying the relevant clusters on the screen and pushing a select key. "After one or more clusters have been selected, the system reformulates the user's query to include class numbers for the selected clusters, and retrieves and ranks the individual MARC records based on this expanded query" (Larson, 1991a, p.17).

Larson (1991a) describes how it is that this tentative relevance information for the selected clusters can be utilized for ranking the individual MARC records:

"In the second stage of retrieval in CHESHIRE, we still have no information about the relevance of individual documents, only the tentative relevance information provided by cluster selection. In this search, the class numbers assigned to the selected clusters are added to the other terms used in the first-stage query. The individual documents are ranked in decreasing order of document relevance weight calculated, using both the original query terms and the selected class numbers, and their associated MARC records are retrieved, formatted, and displayed in this rank order... In general documents from the selected classes will tend to be promoted over all others in the ranking. However, a document with very high index term weights that is not from one of the selected classes can appear in the rankings ahead of documents from that class that have fewer terms in common with the query" (p.17).

Although the identification of relevant clusters can be thought of, quite rightly so, a type of relevance feedback, we rather consider it as some sort of system help before the user's query is run on the entire database.

After all of the above re-weighting and ranking processes, which are based on the user's original query and the selection of relevant clusters, are done, CHESHIRE will eventually come up with individual MARC records. This time the user is able to judge each individual record (rather than the cluster) that is retrieved as being relevant or nonrelevant, again by simply pushing the appropriate key. He/she can examine several records by making relevance judgments along the way for each record until he/she thinks that there is no use to continue displaying records as the probability of relevance gets smaller and smaller. The user's relevance judgment for each document in this stage is recorded.

CHESHIRE has a set of both vector space (e.g., coordination level matching, term frequency - inverse document frequency matching (tfidf), cosine matching) and probabilistic retrieval models available for experimental purposes. XXXXX retrieval algorithm will be used for retrieving documents from the database. [I do not want to discuss all these algorithms, especially if I am going to use only one of them in my experiment. Please advise! I also need advice as to which one to choose.]

6.1.2. Relevance Feedback Process in CHESHIRE

When the user decides to quit because he/she is either satisfied or frustrated with the documents he/she has seen in the course of the first retrieval, the system asks the user if he/she wants to perform a relevance feedback search. If the user decides to perform a relevance feedback search, then the system further revises and modifies the original query based on the documents the user has already judged relevant or nonrelevant in the previous stage. Relevance feedback step, then, enables the system to "understand" the user's query better: the documents that are similar to the query are rewarded by being assigned higher ranks, while dissimilar documents are pushed farther down in the ranking. As a result of the relevance feedback process, the system comes up with more documents (that is, records).

Once again, the searcher would see the new documents, one after another, that were retrieved from the database as the result of the relevance feedback process, and judge them as being relevant or nonrelevant to his/her query. Relevance judgments, again, are automatically recorded for each record user scans.

The relevance feedback search can be iterated as many times as the user desires until he/she is satisfied with the search results.

6.1.3. Relevance Feedback Formula to be Used in the Experiment

Relevance feedback process helps in refining the original query and finding more relevant materials in the subsequent runs. The true advantage gained through the relevance feedback process can be measured in two different ways:

1) By changing the ranking of documents and moving the documents that are judged by the user as being relevant up in the ranking. With this method documents that have already been seen (and judged as being relevant) by the user will still be retrieved in the second try, although they are somewhat ranked higher this time. "This occurs because the feedback query has been constructed so as to resemble the previously obtained relevant items" (Salton and Buckley, 1990, p.292). This effect is called "ranking effect" (Ide, 1971) and it is difficult to distinguish artificial ranking effect from the true feedback effect (Salton and Buckley, 1990). Note that the user may not want to see the documents second time because he/she has already seen them during the initial retrieval.

2) By eliminating the documents that have already been seen by the user in the first retrieval and "freezing" the document collection at this point for the second retrieval. In other words, documents that were judged as being relevant (or nonrelevant) during the initial retrieval will be excluded in the second retrieval, and the search will be repeated only on the frozen part of the collection (i.e., the rest of the collection from which user has seen no documents yet). This is called "residual collection" method and it "depresses the absolute performance level in terms of recall and precision, but maintains a correct relative difference between initial and feedback runs" (Salton and Buckley, 1990, p.292).

The different relevance feedback formulae are basically based on the variations of these two methods. More detailed information on relevance feedback formulae can be found in Salton and Buckley (1990). For mathematical explications of relevance feedback process, see Rocchio (1971); Ide (1971); and, Salton and Buckley (1990).

In this experiment, the feedback weight for an individual query term i will be computed according to the following probabilistic relevance feedback formula:

pi (1 - qi)

log(-------------)

qi (1 - pi)

where

rel_ret + (freq / num_doc)

pi = ----------------------------

num_rel + 1.0

 

freq - rel_ret + (freq / num_doc)

qi = ---------------------------------

num_doc -num_rel + 1.0

where

freq is the frequency of term i in the entire collection;

rel_ret is the number of relevant documents term i is in;

num_rel is the number of relevant documents that are retrieved;

num_doc is the number of documents.

This formula takes into account only the "feedback effect," not the artificial "ranking effect" (i.e., documents retrieved in the first run are not included in the second run).

6.1.4. Relevance Judgments

Relevance judgments for each document that is retrieved for a given query will be recorded for further analysis and for computation of the precision and recall ratios. Similarly, relevance judgments for documents retrieved in subsequent runs will also be recorded for the same purposes.

The procedure of recording relevance judgments is as follows: For each and every record retrieved in response to the user's search request, user is required to take some action. If the record he/she scans is relevant to his/her query, then, he/she simply needs to push the "relevant" key. If the record retrieved is not relevant, then the user simply needs to press the "return" key. This will tell CHESHIRE that the record retrieved is not relevant. Records thus identified as relevant or non-relevant by the user will be taken into account should he/she wishes to perform a relevance feedback search later.

It should be noted that relevance judgments will be done by the users themselves for search queries that are based on real information needs.

For the purposes of further testing the retrieval effectiveness of CHESHIRE, some search queries will be repeated on the system. Relevance judgments in this stage will be performed by the researcher. However, this will be done after the data concerning user needs and intentions have been gathered through the critical incident technique. It is believed that, based on the contextual feedback to be gained from users for each query, objective relevance judgments reflecting actual users' decision making processes as much as possible can be made by the researcher.

6.1.5. Retrieval Effectiveness Measures

Precision and recall are the most commonly used retrieval effectiveness measures in information retrieval research. As given before, precision is defined as the proportion of the retrieved documents that are relevant. Recall is the proportion of the relevant documents that are retrieved.

The above measures will be calculated in this experiment as follows:

Precision will be taken as the ratio of the number of documents that a user judged relevant (by pressing "relevant" key) for a particular query over the total number of records he/she scanned when either decided to quit or do a relevance feedback search. There is a slight difference between the original definition of the precision and that which will be used in this experiment: instead of taking the total number of retrieved records in response to a particular query, we will take the total number of records scanned by the user no matter how many records the system retrieves for a particular query. For instance, if the user stops after scanning 2 records and judges one of them relevant, then the precision ratio will be 50%.

Precision ratios for retrievals during the relevance feedback process will be calculated in the same way.

Recall is considerably more difficult to calculate than precision since it requires finding relevant documents that will not be retrieved in the course of users' initial searches (Blair and Maron, 1985). In this experiment, recall will be calculated for each search based on the previously identified relevant documents that will be retrieved using various techniques such as taking samples from rich subsets of the database (Blair and Maron, 1985; Larson, 1991c). The familiarity with the database (i.e., records mainly about Library and Information Science) is thought to facilitate the researcher's task in this respect.

It is worth repeating that the relevance judgments when calculating recall will be made by the researcher based on the data to be obtained from the users through the critical incident technique. As mentioned earlier, incident reports will provide feedback about users' information needs and put each search statement in perspective, which will facilitate relevance judgments further.

In addition to finding out retrieval effectiveness through precision and recall measures, retrieval effectiveness will also be evaluated by gathering data from the users. In other words, users will be consulted as to what they think about the effectiveness of specific searches that they performed on CHESHIRE. Although it is not possible to quantify user designated retrieval effectiveness in mathematical terms, it will nonetheless be interesting to compare user designated ineffective searches with precision and recall ratios for corresponding search queries.

6.2 Subjects

Doctoral and master's students in the School of Library and Information Studies will be approached and their agreement will be sought for data collection, analysis and evaluation purposes in conducting this experiment. The CHESHIRE online catalog will become accessible to them for online searches throughout the Fall 1991 semester. The experiment will be publicized through appropriate channels in order to increase the number of participants in the experiment. It is believed that CHESHIRE catalog will be utilized as its database contains the holdings of the Library of the School of Library and Information Studies along with full bibliographic information for each document including call numbers.

Demographical data will be collected about the subjects who will participate in this experiment.

Each subject will be issued a password in order to get access to the CHESHIRE online catalog. Passwords will identify subjects for data gathering purposes and trigger the transaction log programs to record each subject's entire session on CHESHIRE.

The number of search queries to be collected is expected to be around 200. This figure is thought to be appropriate for evaluation purposes as most information retrieval experiments in the past had been conducted with much fewer number of queries.

7. Data Gathering, Analysis, and Evaluation Methodology

Data gathering and analysis will be the next step in the experiment.

A variety of data gathering techniques will be used throughout the experiment, the most important ones being transaction logs and interviews to collect incident reports about search failures.

Transaction logs will capture data about the entire session for each search to be conducted on CHESHIRE. A number of data elements can be recorded in transaction logs. The following elements represent the kind of data that can be captured for each search request: user's password, logon time and date (to the nearest second), terminal type, the search statement(s), number of index terms per search statement, records retrieved and displayed to the user, number of records displayed for each search, user's relevance judgment on each record displayed, number of records displayed and judged relevant by the user (which is the precision ratio), relevance feedback requests, number of times user requests relevance feedback search for the same query, number of search terms matching title and subject headings of each document, response time, the total time user spent on the system, etc.

A number of programs which can be used to record transaction logs are available on CHESHIRE. Search statements can be stored as text files at present. Additional programs to capture transaction data will be written by the researcher as and when required.

A letter asking for subjects' permission to review their transactions will be sent to all participants at the beginning of the experiment.

Further data will be gathered through the critical incident technique. A questionnaire will be developed which will be used during the structured interviews with the subjects concerning their experience and search success in CHESHIRE. [I am thinking to modify the one in Wilson and Starr-Schneidkraut (1989).] Questions regarding users' recent searches in CHESHIRE along with information needs that triggered those particular searches will be included in the questionnaire. Also to be included are users' views about the effectiveness of their searches.

"All search activity...for each user...[will be] identified, and an attempt...[will be] made to match each incident report with a corresponding search recorded in the...[transaction log....The resultant incident reports and corresponding logs...[will be] compared...on a number of objective features in order to determine the relationship between the features as reported by the respondent and those reflected in the log" (Wilson and Starr-Schneidkraut, 1989, pp.14-15).

The analysis of transaction log data will reveal quantitative data about the use of the CHESHIRE online catalog system during the period of experiment. For instance, such statistical data as the number of searches conducted, number of different users, number of records displayed and judged relevant, average number of terms in search statements, average number of matching terms between search statements and titles and subject headings of the documents, and system usage statistics can be easily gathered. (This will, of course, depend on the transaction log programs that will be used in the experiment.)

Transaction logs will later be analyzed for qualitative purposes. An attempt will be made to identify search failures along with their causes by making use of a variety of methods: analyzing search statements, comparing the match between search terms and titles and subject headings, analyzing the user supplied incident report, and analyzing the records retrieved and displayed.

System-assigned term weights might cause search failures as well. It is possible to go back and see the assigned weights and determine if the search failure occurred because of system-assigned term weights.

Incident reports and logs of user designated ineffective searches will be further analyzed.

Search terms in the queries and the titles of documents and LCSH assigned to them will be compared so as to find out the match between users' vocabulary and that of the system. Such a comparison may furnish further evidence to help explain search failures. The results can be tabulated for each query to see if there is a correlation between the success rates obtained through matching and non-matching search queries.

In order to identify the role of natural language-based user interfaces in retrieval effectiveness, some queries can be searched on MELVYL using detailed search tactics. Although the results will not be directly comparable to those obtained in CHESHIRE, the individual records can be compared so as to see if additional records are retrieved by either of the systems.

Evaluation of retrieval effectiveness will be based on precision and recall measures. For each search query, precision and recall measures will be calculated. The same measures will be used to evaluate the retrieval effectiveness for relevance feedback

process as well. Precision/recall graphs can be drawn for all the search queries to illustrate the retrieval effectiveness in CHESHIRE. The improvement in precision/recall ratios, should there be any, due to the relevance feedback effect can be observed from such graphs.

[Insert a P/R graph near here in the next draft]

The relationship between user designated retrieval effectiveness and precision/recall measures will be studied. In order to make user designated retrieval effectiveness more explicit for the purpose of comparison, a question can be added to the critical incident technique questionnaire asking users to grade the retrieval effectiveness of their recent searches on a Likert-type scale. The results can be compared with the precision/ recall ratios found for corresponding search queries recorded in transaction logs.

Although Larson (1989) points out that "experience with the CHESHIRE system has indicated that the ranking mechanism is working quite well, and the top ranked clusters provide the largest numbers of relevant items" (p.133), determining the role of relevance feedback in improving the retrieval effectiveness in CHESHIRE will be a most difficult problem to tackle. Nevertheless, it is possible in CHESHIRE to see the top ranked documents and their ranking scores for each search query.

It is expected that the movements of, say, top 30 records retrieved in the first retrieval can be monitored during the relevance feedback retrieval.

[Sir, how important is it to chart the movements of top 30 records for relevance feedback purposes? If your answer is "important enough," I have another suggestion to possibly solve this problem: Since I have chosen a relevance feedback algorithm that throws the previously displayed records (relevant or not) to the bottom of the list, how about charting the movements of "retrieved but not judged/not displayed" records in the relevance feedback run? It would go something like this: say, user stops after judging 5 records and requests a relevance feedback search. (But, then, what if he/she decides to display all of them?) Even if user has not gone through the full list of all retrieved records (say, 20) in the first run, they are still there. How about monitoring the movements of the "undisplayed 15" records in the first run during the relevance feedback run? Let's suppose that the user would have judged some of the 15 "undisplayed" records relevant had he scanned through the full list. Let's further suppose that user has judged 2 of the five displayed records relevant. Since, at least, some of the "15 undisplayed" records would be similar to those 2 records that were judged relevant, they should be retrieved in top ranks during the relevance feedback retrieval (supposing, of course, that the retrieval algorithm is working reasonably well). We can then observe the improvement by monitoring the movements of undisplayed records! One caveat: I need to reconstruct each and every search query in order to display the "undisplayed" records to record them in transaction logs! Further, I need to compare originally "undisplayed" records with the records that are retrieved in the second run to see the improvement.. I hope I am making myself clear. Is this strategy worth further thoughts???]

Various statistical tests are intended to measure the significance of the average difference in values of retrieval effectiveness between the two retrievals. Significance tests will measure the probability if "the two sets of values obtained from two separate runs are actually drawn from samples which have the same characteristics" (Ide, 1971 p.344). The t test and Wilcoxon signed rank test will be used for the evaluation of findings.

The correlation between the search failures and matching of users' natural language query terms with the titles of documents and LCSH will also be sought.

8. Expected Results

First and foremost, the causes of search failures in online catalogs will be identified. The detailed analysis of search failures will help improve the training programs for online library catalogs.

Based on the results, the design of CHESHIRE and other online catalogs can be improved so as to accommodate user preferences. For instance, if we find that users rely more on subject searching in online catalogs, more weights can be assigned to query terms matching LCSH assigned to the records.

It is expected that users will find it easier to use online catalogs with natural query languages than online catalogs based on strict Boolean logic. Accordingly, more helpful online catalog user interfaces can be designed.

It is also to be expected that relevance feedback process

will improve retrieval effectiveness in online catalogs. Further, users will find relevance feedback technique useful and use it.

Retrieval effectiveness values to be found for "third generation" online catalogs with relevance feedback and natural language query-based user interfaces such as CHESHIRE will be comparable to that in "second generation" online catalogs.

A pool of search queries stemming from real information needs will be gathered for CHESHIRE. This will allow further testing and comparison of advanced retrieval techniques in CHESHIRE.

The critical incident technique will be used for the first time in studying search failures in online catalogs. If proved useful and practical, the technique can be utilized in other online catalog studies as well. It is expected that the critical incident technique will add value to the data gathered through transaction logs.

9. Select Bibliography

Bates, Marcia J. 1972. "Factors Affecting Subject Catalog Search Success," Unpublished Doctoral Dissertation. University of California, Berkeley.

___________. 1977a. "Factors Affecting Subject Catalog Search Success,"Journal of the American Society for Information Science 28(3): 161-169.

___________. 1977b."System Meets User: Problems in Matching Subject Search Terms," Information Processing and Management 13: 367-375.

__________. 1986. "Subject Access in Online Catalogs: a Design

Model," Journal of American Society for Information Science

37(6): 357-376.

___________. 1989a. "The Design of Browsing and Berrypicking Techniques for the Online Search Interface," Online Review 13(5): 407-424.

__________. 1989b. "Rethinking Subject Cataloging in the Online Environment," Library Resources and Technical Services 33(4): 400-412.

Besant, Larry. 1982. "Early Survey Findings: Users of Public Online Catalogs Want Sophisticated Subject Access," American Libraries 13: 160.

Blair, David C. and M.E. Maron. 1985. "An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System," Communications of the ACM 28(3): 289-299, March 1985.

Blazek, Ron and Dania Bilal. 1988. "Problems with OPAC: a Case Study of an Academic Research Library," RQ 28:169-178.

Borgman, Christine L. 1986. "Why are Online Catalogs Hard to Use? Lessons Learned from Information-Retrieval Studies" Journal of American Society for Information Science 37(6): 387-400.

Borgman, Christine L. "End User Behavior on an Online Information Retrieval System: A Computer Monitoring Study," in: International Conference on Research and Development in Information Retrieval. 6th Annual International ACM SIGIR Conference. Edited by Jennifer J. Kuehn. New York: ACM, 1983. pp.162-176.

Buckley, Chris. (1987). Implementation of the SMART Information Retrieval System. Ithaca, N.Y.: Cornell University, Department of Computer Science.

Byrne, Alex and Mary Micco. 1988. "Improving OPAC Subject Access: The ADFA Experiment," College & Research Libraries 49(5): 432-441.

Campbell, Robert L. 1990. "Developmental Scenario Analysis of Smalltalk Programming," in Empowering People: CHI '90 Conference Proceedings, Seattle, Washington, April 1-5, 1990. Edited by Jane Carrasco Chew and John Whiteside. New York: ACM, 1990, pp.269-276.

Carlyle, Allyson. 1989. "Matching LCSH and User Vocabulary in the Library Catalog," Cataloging & Classification Quarterly 10(1/2): 37-63, 1989.

Chan, Lois Mai. 1986a. Library of Congress Subject Headings:

Principles and Application. 2nd edition. Littleton, Co.:

Libraries Unlimited, Inc.

_________. 1986. Improving LCSH for Use in Online Catalogs.

Littleton, CO.: Libraries Unlimited, Inc.

Cochrane, Pauline A. and Karen Markey. 1983. "Catalog Use Studies -Since the Introduction of Online Interactive Catalogs: Impact on Design for Subject Access," Library and Information Science Research 5(4): 337-363.

Cooper, Michael D. 1991. "Failure Time Analysis of Office System Use," Journal of American Society for Information Science (to appear in 1991).

Cooper, Michael D. and Cristina Campbell. 1989. "An Analysis of User Designated Ineffective MEDLINE Searches," Berkeley, CA: University of California at Berkeley, 1989.

Dale, Doris Cruger. 1989. "Subject Access in Online Catalogs: An Overview Bibliography," Cataloging & Classification Quarterly 10(1/2): 225-251, 1989.

Flanagan, John C. 1954. "The Critical Incident Technique," Psychological Bulletin 51(4): 327-358, July 1954.

Frost, Carolyn O. 1987a. "Faculty Use of Subject Searching in Card and Online Catalogs," Journal of Academic Librarianship 13(2): 86-92.

Frost, Carolyn O. 1989. "Title Words as Entry Vocabulary to LCSH: Correlation between Assigned LCSH Terms and Derived Terms From Titles in Bibliographic Records with Implications for Subject Access in Online Catalogs," Cataloging & Classification Quarterly 10(1/2): 165-179, 1989.

Frost, Carolyn O. and Bonnie A. Dede, 1988. "Subject Heading Compatibility between LCSH and Catalog Files of a Large Research Library: a Suggested Model for Analysis," Information Technology and Libraries 7: 292-299, September 1988.

___________. 1987b. "Subject Searching in an Online Catalog,"

Information Technology and Libraries 6: 61-63.

Gerhan, David R. 1989. "LCSH in vivo: Subject Searching Performance and Strategy in the OPAC Era," Journal of Academic Librarianship 15(2): 83-89.

Hancock, Micheline. 1987. "Subject Searching Behaviour at the Library Catalogue and at the Shelves: Implications for Online Interactive Catalogues," Journal of Documentation 43(4): 303-321.

Hartley, R.J. 1988. "Research in Subject Access: Anticipating the

User," Catalogue and Index (88): 1,3-7.

Hildreth, Charles R. 1989. Intelligent Interfaces and Retrieval Methods for Subject Searching in Bibliographic Retrieval Systems. Washington, DC: Cataloging Distribution Service, Library of Congress.

Holley, Robert P. 1989. "Subject Access in the Online Catalog," Cataloging & Classification Quarterly 10(1/2): 3-8, 1989.

Hays, W.L. and R.L. Winkler. 1970. Statistics: Probability, Inference and Decision. Vol. II. New York: Holt, Rinehart and Winston, 1970. (pp.236-8 for Wilcoxon sign tests in IR research.)

Ide, E. (1971). "New Experiments in Relevance Feedback." in Salton, Gerard, ed. The SMART Retrieval System: Experiments in Automatic Document Processing. Englewood Cliffs, N.J.: Prentice-Hall. pp. 337-354.

Kaske, Neal N. 1988a. "A Comparative Study of subject Searching in an OPAC Among Branch Libraries of a University Library System," Information Technology and Libraries 7: 359-372.

___________. 1988b. "The Variability and Intensity over Time of Subject Searching in an Online Public Access Catalog," Information Technology and Libraries 7: 273-287.

Kaske, Neal K. and Sanders, Nancy P. 1980. "Online Subject Access: the Human Side of the Problem," RQ 20(1): 52-58.

__________. 1983. A Comprehensive Study of Online Public Access Catalogs: an Overview and Application of Findings. Dublin, OH: OCLC. (OCLC Research Report # OCLC/OPR/RR-83-4)

Kern-Simirenko, Cheryl. 1983. "OPAC User Logs: Implications for Bibliographic Instruction," Library Hi Tech 1: ??, Winter 1983.

Kinsella, Janet and Philip Bryant. 1987. "Online Public Access Catalog Research in the United Kingdom: An Overview," Library Trends 35: ??,

Klugman, Simone. 1989. "Failures in Subject Retrieval," Cataloging & Classification Quarterly 10(1/2): 9-35, 1989.

Kretzschmar, J.G. 1987. "Two Examples of Partly Failing Information Systems," in: Wise, John A. and Anthony Debons, eds. Information Systems: Failure Analysis. Berlin: Springer Verlag, 1987.

Lancaster, F.W. 1968. Evaluation of the MEDLARS Demand Search Service. Washington, DC: US Department of Health, Education and Welfare, 1968.

Lancaster, F.W. 1969. "MEDLARS: Report on the Evaluation of Its Operating Efficiency," American Documentation 20(2): 119-142, April 1969.

Larson, Ray R. 1986. "Workload Characteristics and Computer System Utilization in Online Library Catalogs." Doctoral Dissertation, University of California at Berkeley, 1986. (University Microfilms No. 8624828)

___________. 1989. "Managing Information Overload in Online Catalog Subject Searching,"In: ASIS '89 Proceedings of the 52nd ASIS Annual Meeting Washington, DC, October 30-November 2, 1989. Ed. by Jeffrey Katzer et al. Medford, NJ: Learned Information. pp. 129-135.

___________. 1991a. "Classification Clustering, Probabilistic Information Retrieval and the Online Catalog," Library Quarterly 61, April 1991. [in press]

___________. 1991b. "The Decline of Subject Searching: Long Term Trends and Patterns of Index Use in an Online Catalog," Journal of American Society for Information Science (Submitted for publication) [1991].

___________. 1991c. "Evaluation of Advanced Information Retrieval Techniques in an Experimental Online Catalog," Journal of American Society for Information Science (Submitted for publication) [1991].

Larson, Ray R. and V. Graham. 1983. "Monitoring and Evaluating MELVYL," Information Technology and Libraries 2: 93-104.

Lawrence, Gary S. 1985. "System Features for Subject Access in the Online Catalog," Library Resources and Technical Services 29(1): 16-33.

Lawrence, Gary S., V. Graham and H. Presley. 1984. "University of California Users Look at MELVYL: Results of a Survey of Users of the University of California Prototype Online Union Catalog," Advances in Library Administration 3: 85-208.

Lewis, David. 1987. "Research on the Use of Online Catalogs and Its Implications for Library Practice," Journal of Academic Librarianship 13(3): 152-157.

Markey, Karen. 1980. Analytical Review of Catalog Use Studies. Dublin, OH: OCLC, 1980. (OCLC Research Report # OCLC/OPR/RR-80/2.)

_________. 1983. The Process of Subject Searching in the Library Catalog: Final Report of the Subject Access Research Project. Dublin, OH: OCLC.

_________. 1984. Subject Searching in Library Catalogs: Before

and After the Introduction of Online Catalogs. Dublin, OH: OCLC.

_________. 1985. "Subject Searching Experiences and Needs of Online Catalog Users: Implications for Library Classification," Library Resources and Technical Services 29: 34-51.

_________. 1986. "Users and the Online Catalog: Subject Access Problems," in Matthews, J.R. (ed.) The Impact of Online Catalogs pp.35-69. New York: Neal-Schuman, 1986.

_________. 1988. "Integrating the Machine-Readable LCSH into Online Catalogs," Information Technology and Libraries 7: 299-312.

Maron, M.E. 1984. "Probabilistic Document Retrieval Systems," in:

Matthews, Joseph K. 1982. A Study of Six Public Access Catalogs: a Final Report Submitted to the Council on Library Resources, Inc. Grass Valley, CA: J. Matthews and Assoc., Inc.

Matthews, Joseph, Gary S. Lawrence and Douglas Ferguson (eds.) 1983. Using Online Catalogs: a Nationwide Survey. New York: Neal-Schuman.

Mitev, N.N., G.M. Venner and S. Walker. 1985. Designing an Online Public Access Catalogue. (Library and Information Research Report 39) London: British Library, 1985.

Naharoni, A. 1980. "An Investigation of W.T. Grant as Information System Failure," Ph.D. Dissertation, University of Pittsburgh, Pittsburgh, PA, 1980.

Nielsen, Brian. 1986. "What They Say They Do and What They Do: Assessing Online Catalog Use Instruction Through Transaction Monitoring," Information Technology and Libraries 5: 28-34, March 1986.

Norman, D.A. 1980. Errors in human Performance. San Diego, CA: University of California, 1980.

Norman, D.A. 1983. "Some Observations on Mental Models," in: Stevens, A.L and D. Gentner, eds. Mental Models. Hillsdale, NJ: Erlbaum, 1983.

Pease, Sue and Gouke, Mary Noel. 1982. "Patterns of Use in an Online Catalog and a Card Catalog," College and Research Libraries 43(4): 279-291.

Penniman, W.D. and W.D. Dominic. 1980. "Monitoring and Evaluation of On-line Information System Usage," Information Processing & Management 16(1): 17-35, 1980.

Penniman, W. David. 1975. "A Stochastic Process Analysis of On-line User Behavior," Information Revolution: Proceedings of the 38th ASIS Annual Meeting, Boston, Massachusetts, October 26-30, 1975. Volume 12. Washington, DC: ASIS, 1975. pp.147-148.

Peters, Thomas A. 1989. "When Smart People Fail: An Analysis of the Transaction Log of an Online Public Access Catalog," Journal of Academic Librarianship 15(5): 267-273, November 1989.

Reason, J. and K. Mycielska. 1982. Absent-Minded? The Psychology of Mental Lapses and Everyday Errors. Englewood Cliffs, NJ: Prentice Hall, 1982.

Rocchio, Jr., J.J. (1971). "Relevance Feedback in Information Retrieval." in Salton, Gerard, ed. The SMART Retrieval System: Experiments in Automatic Document Processing. Englewood Cliffs, N.J.: Prentice-Hall. pp.313-323.

Salton, G. (1971). "Relevance Feedback and the Optimization of Retrieval Effectiveness." in Salton, Gerard, ed. The SMART Retrieval System: Experiments in Automatic Document Processing. Englewood Cliffs, N.J.: Prentice-Hall. pp.324-336.

Salton, Gerard, ed. (1971). The SMART Retrieval System: Experiments in Automatic Document Processing. Englewood Cliffs, N.J.: Prentice-Hall.

Salton, Gerard and Chris Buckley. (1990). "Improving Retrieval Performance by Relevance Feedback," Journal of the American Society for Information Science 41(4): 288-297.

Shepherd, Michael A. 1981. "Text Passage Retrieval Based on Colon Classification: Retrieval Performance," Journal of Documentation 37(1): 25-35, March 1981.

Shepherd, Michael A. 1983. "Text Retrieval Based on Colon Classification: Failure Analysis," Canadian Journal of Information Science 8: 75-82, June 1983.

Svenonius, Elaine. 1986. "Unanswered Questions in Controlled Vocabularies," Journal of the American Society for information Science.

Svenonius, Elaine and H. P. Schmierer. 1977. "Current Issues in the Subject Control of Information," Library Quarterly 47: 326-346.

Swanson, Don R. (1977). "Information Retrieval as a Trial-and-Error Process," Library Quarterly 47(2): 128-148.

Tague, J. and J. Farradane. 1978. "Estimation and Reliability of Retrieval Effectiveness Measures," Information Processing and Management 14: 1-16, 1978.

Tolle, John E. 1983. Current Utilization of Online Catalogs: Transaction Log Analysis. Dublin, OH: OCLC, 1983.

Tolle, John E. 1983. "Transaction Log Analysis: Online Catalogs," in: International Conference on Research and Development in Information Retrieval. 6th Annual International ACM SIGIR Conference. Edited by Jennifer J. Kuehn. New York: ACM, 1983. pp.147-160.

Users Look at Online Catalogs: Results of a National Survey of Users and Non-Users of Online Public Access Catalogs. 1982. Berkeley, CA: The University of California.

University of California Users Look at MELVYL: Results of a

Survey of Users of the University of California Prototype

Online Union Catalog. 1983. Berkeley, CA: The University of

California, 1983.

Van der Veer, Gerrit C. 1987. "Mental Models and Failures in Human-Machine Systems," in: Wise, John A. and Anthony Debons, eds. Information Systems: Failure Analysis. Berlin: Springer Verlag, 1987.

Van Pulis, N. and L.E. Ludy. 1988. "Subject Searching in an Online Catalog with authority Control," College & Research Libraries 49; 523-533, 1988.

Van Rijsbergen, C.J. (1979). Information Retrieval. 2nd ed. London: Butterworths.

Walker, Stephen and R. de Vere. 1990. Improving Subject Retrieval in Online Catalogues. 2: Relevance Feedback and Query Expansion. (British Library Research Paper, no. 72) London: British Library, 1990.

Wilson, Patrick. 1983. "The Catalog as Access Mechanism: Background and Concepts," Library Resources and Technical Services 27(1): 4-17.

Wilson, Sandra R. and Norma Starr-Schneidkraut. 1989. Use of the Critical Incident Technique to Evaluate the Impact of MEDLINE. (Final Report) Draft August 11, 1989. Contract No. N01-LM-8-3529. Bethesda, MD: National Library of Medicine.

Wise, John A. and Anthony Debons, eds. 1987. Information Systems: Failure Analysis. Berlin: Springer Verlag, 1987.