CHAPTER VII

ANALYSIS OF RETRIEVAL PERFORMANCE IN CHESHIRE

7.0 Introduction

Quantitative findings regarding CHESHIRE's retrieval performance as determined by precision and recall measures were discussed in Chapter VI. One of the primary objectives of the present study is to examine the retrieval performance of CHESHIRE more comprehensively. The analysis of CHESHIRE's retrieval effectiveness to be presented in this chapter is based on the results obtained from transaction logs, questionnaire forms and structured interviews with the participating users.

7.1 Determining Retrieval Performance

It was noted in earlier chapters that no single measure of retrieval performance is sufficient to determine the retrieval performance of an online catalog. The results of multiple linear regression analysis that we discussed in Chapter VI showed that there was no strong correlation between traditional retrieval performance measures (precision and recall) and user characteristics and users' assessment of search performance. Furthermore, the performance of the retrieval system for each query as measured by precision and recall ratios varied a great deal. This suggests that a qualitative analysis of search effectiveness for each query will be helpful to explain the variations in the retrieval performance of the system.

The qualitative analysis of retrieval performance in CHESHIRE presented below makes use of several pieces of data that we gathered by means of transaction logs, questionnaires, structured interviews, and comprehensive searches.

As discussed in detail in Chapter V, a pre-search questionnaire was filled out by the participating users which included questions on user type, frequency of catalog use, and the knowledge of computer software packages and online searching. This data was presented in the previous chapter. Then, users performed catalog searches throughout the data collection period. All the searches they performed were recorded in transaction logs along with users' relevance judgments on retrieved records. Precision ratios were calculated from the data recorded in transaction logs. Next, we performed successive searches on CHESHIRE and MELVYLŪ in order to determine the recall base for each search query submitted to the system. Recall ratios were calculated from this data.

Once the data collection period was over, participating users were asked to fill out a questionnaire form for search queries that they performed on the system. Data on whether users found what they wanted in the catalog along with their perceived search success in terms of precision and recall, both before and after relevance feedback searches, came from the questionnaire. Some of the quantitative findings obtained through the questionnaire were presented in Chapter VI.

After the users filled out the questionnaire, we interviewed them so as to find out about their views of the retrieval performance of the system and audiotaped their comments. Thus, the users' assessments of retrieval effectiveness came from questionnaires and structured interview results.

The overall retrieval performance of the system for a given search query was then determined on the basis of three pieces of information. A search query was considered as being effective if: a) the user found what he or she was looking for (as recorded in the questionnaire form); b) the user judged the search results as being effective (as recorded in the critical incident form and the script of the structured interview); and c) precision and recall ratios were commensurate with, to a certain extent, the user's judgment. However, it is difficult to come up with a formula that would indicate to what extent each piece of information has contributed to the final decision as to the effectiveness or ineffectiveness of a given search query, although, it should be emphasized, users' own assessments of their search queries were weighted more heavily. To put it somewhat differently, the retrieval performance of CHESHIRE for a given search query was judged as being effective unless there was a considerable discrepancy that was unaccounted for between the answers supplied by the user and the precision and recall ratios.

In the qualitative analysis, "out-of-domain" search queries and search queries that retrieved nothing (i.e., zero retrievals) were identified from the transaction logs. Searches that retrieved nothing were later analyzed to determine the causes of failure by examining the search statement and collection make-up. Similarly, queries in which users selected no clusters as being relevant were identified from the logs and all such queries were repeated on CHESHIRE in order to determine the causes of such incidents. Search queries for which precision and recall ratios were available were also analyzed to determine the retrieval effectiveness and to corroborate the findings obtained from questionnaires, critical incident report forms, and structured interview scripts.

7.2 Retrieval Performance in CHESHIRE

The retrieval performance of CHESHIRE as measured by traditional precision and recall ratios was given in Chapter VI. On the average, half the records CHESHIRE retrieved were judged as being relevant by the users (precision) before relevance feedback searches. On the other hand, CHESHIRE retrieved only about 25% of all the relevant documents in the database (recall). As should be expected, precision ratios went down (18%) while recall ratios increased (45%) as users performed relevance feedback searches. To put it differently, successive relevance feedback searches improved the recall ratios to a point where almost half the relevant records in the database were retrieved.

What follows is a comprehensive analysis of retrieval performance in CHESHIRE, which incorporates not only precision and recall measures but also feedback gathered from the users on their assessments of search effectiveness. The analysis consists of two parts: 1) analysis of search failures that occurred in CHESHIRE; and 2) examination of search effectiveness in CHESHIRE. The first part concentrates on the analysis of the causes of search failures in CHESHIRE, the main theme of this dissertation. In the second part, we will emphasize CHESHIRE's strengths as a third generation online catalog and compare it with other catalogs.

7.2.1 Analysis of Causes of Search Failures in CHESHIRE

Altogether users performed 228 search queries on CHESHIRE. A total of 107 search queries (46.9%) failed due to a wide variety of reasons including collection failures. Table 7.1 summarizes the causes of search failures for those 107 search queries.

Table 7.1 Causes of Search Failures (N=107)

Causes of search failures

N

%

Collection failure

42

39.3

User interface problem

13

12.1

Search statement

11

10.3

Known-item search

11

10.3

Cluster failures

8

7.5

Library of Congress Subject Headings

5

4.7

Stemming algorithm

4

3.7

No apparent reason

3

2.8

Specific query

2

1.9

Cluster selection

2

1.9

Communication problem

2

1.9

Scope

2

1.9

False drops

1

0.9

Call number search

1

0.9

TOTAL

107

100.1

Notes: (1) Percentage totals do not all equal to 100% due to rounding.
(2) Definitions of the categories of search failures can be found in Chapter IV.

As can be seen from Table 7.1, collection failure was the primary cause of almost 40% of all unsuccessful search queries. This was followed by the problems that users experienced with CHESHIRE's user interface (12.1%). Flaws in the search statements caused failures in more than 10% of search queries. Another 10% of the queries failed because some users tried to perform known-item searches, which is not supported by CHESHIRE. (The online catalog supports subject searching only.) Users found the retrieved clusters nonrelevant for eight (7.5%) search queries. They discontinued their searches upon seeing the retrieved, but not-so-promising, cluster records. The Library of Congress Subject Headings (LCSH) was the primary cause of almost 5% of all search failures. CHESHIRE's stemming algorithm was the cause of four (3.7%) search failures. A total of thirteen (12.1%) search queries failed due to, among others, telecommunication (telnet) problems, cluster selection, and false drops.

The detailed findings with regards to each type of search failure are presented below.

7.2.1.1 Analysis of Collection Failures

Some 42 search queries (39.3%) failed as there were no relevant sources in the database. (See Chapter V for a detailed description of how relevant sources were found.) This type of failure is commonly called "collection failure" and it constitutes, generally speaking, a considerable percentage of all search failures in online catalogs.

More than two-thirds (30 out of 42) of all collection failures in this study were coupled with specific search queries. For instance, the Library School Library (LSL) collection simply lacked sources that could have satisfied specific search queries such as "virtual reality cyberspace" (#65), "classification of materials on gay and lesbian studies" (#222), "hypermedia" (#20), "hypertext" (#21), "indexes for information resources on or in networks like Internet and Bitnet," (#15 and #16) "minitel" (#108), "novell" (#125), "cheshire" (#191), "xerox windows" (#80), and "project mercury" (#186).

Some of the collection failures cited above occurred because the search topics were relatively new. Monographic literature on, say, "virtual reality cyberspace," "indexes for information resources on or in networks like Internet and Bitnet," and "project mercury" came into being very recently and the database contains records only up to 1989. Therefore search queries for those relatively new topics failed without retrieving any promising bibliographic records. Similarly, despite the fact that the system retrieved promising items, search queries for the most recent publications (i.e., published since 1989) were judged as being ineffective due to collection failures (i.e., #66 and #73). Some of the search queries were very specific in nature and the database lacked specific sources to satisfy such queries (i.e., #191, #62, #175, #56). The literature simply did not exist in published form for some other search queries (#141, #193). Four search queries in this group retrieved nothing at all (zero retrievals) due to collection failures (#13, #20, #21, #108).

Out-of-domain search queries are not treated as collection failures in this study. Several users were apparently unaware of the domain of the CHESHIRE database and issued out-of-domain search queries on, say, ancient Chinese poetry, romance novels, and Alfred Hitchcock films.

7.2.1.2 Analysis of the Causes of User Interface Problems

CHESHIRE is an experimental online catalog that has been made accessible to the users who participated in this study. It was developed by Larson (1989, 1991a, 1992) for his theoretical research on advanced information retrieval techniques. It is fair to suggest that more emphasis has been given to its functionality than its user interface during the design and implementation stages. Yet the user interface was the primary cause of only 13 (12.1%) unsuccessful search queries. In eight cases the users indicated that they simply did not know how to use the system or how to proceed once they entered their search queries (#27, #29, #40, #41, #53, #72, #189, #190). Some "got lost" and "couldn't tell from the interface how to select an item." The user interface was "just too foggy" for some others and it "didn't give enough user clues." One user was desperately seeking help (#43, #44) while two others could not figure out how to quit the system (#79, #94). Their help and quit requests were treated as legitimate search queries by the natural language user interface. Another user experienced problems when editing her search statement and could not backspace to previous lines (#98).

Of 13 search queries which failed due to user interface problems, seven occurred when users attempted to search CHESHIRE for the first time. This would seem to suggest that some first time users were not well-served by the user interface. It should also be mentioned that CHESHIRE has no help screens of any significance to guide the novice users.

7.2.1.3 Analysis of Failures Caused by Search Statements

A total of 11 (10.3%) search queries failed due to major flaws in the users' search statements. Vocabulary problems (#28, #32, #38, #178, #184, #194, #223, #225), incomplete search queries (#45), misspellings (#227), truncated search terms (#184), and indecipherable query statements (#64) are classified under this group.

Several factors caused vocabulary problems: some search queries contained abbreviated or truncated terms while others were broad, did not describe the user's information need adequately, and contained search terms that were not retrieval worthy. For instance, the abbreviated search term "cip" retrieved a few records but missed many that were listed under the spelled out form ("Cataloging-in-Publication"). Similarly, some queries contained truncated search terms ("librar"), which was not supported by the system. The stemming algorithm failed to recognize such terms because they were not listed in the system's dictionary in that form and thus ignored during the retrieval.

Some search queries did not describe users' real information needs. For instance, one user entered "freedom of information" (#28) as her search query even though she was looking for information on "national security issues and classification of documents."

Some users qualified their search queries and entered phrases such as "subject search" or "title search" as a part of their complete search statements. However, such terms were treated as legitimate search terms and treated as such, thereby causing some false drops. Some others simply described what they wanted (i.e., "alternatives to traditional subject headings" (#223)) and expected the system to handle the rest. However, the system cannot handle such queries successfully as it has no natural language understanding capabilities.

7.2.1.4 Analysis of the Causes of Known-item Search Failures

It appears that a few users were unaware of the fact that CHESHIRE only allows subject searching. Four users entered a total of 11 known-item search queries: five personal author (#90, #91, #92, #93, #97), and six title searches (three for book titles (#200, #201, #204), and three for periodical titles (#78, #202, #203)). All eleven known-item search queries failed one way or the other.

The stemming algorithm did not recognize personal author names (i.e., "marcia tuttle," "katz," and "patrick wilson") as legitimate query terms because author names are not taken into account during the retrieval. One of the personal author searches was in fact a factual query: "how many books by patrick wilson does the library have?" As the system performs no semantic analysis on the search statement, this query could not be satisfied.

The rest of the known-item searches were for periodical and monographic titles (three search queries each). Needless to say, the system treated all six searches as subject searches and retrieved some items accordingly, although not necessarily the ones sought by the users.

7.2.1.5 Analysis of the Causes of Cluster Failures

As pointed out earlier, CHESHIRE expands the users' original queries on the basis of classification clustering process where users are asked to indicate whether retrieved cluster records seem relevant or not. The query expansion is largely based on the title words, LC subject headings, and classification numbers present in clusters judged as relevant by the users. However, if, for some reason, the user happens to select no cluster as relevant, the search would end without retrieving bibliographic records.

As briefly explained in Chapter V, when the user selects no cluster as relevant, cluster records do not get recorded in the transaction files. In order to see which clusters the users did not like, the search queries were re-created just to record the clusters in the transaction log file. Search queries were re-entered exactly as they were and then displayed one by one. In order to record the clusters in the transaction file, we selected the first cluster in each search as relevant and then quit. We repeated this process for all queries that retrieved some clusters but that none of them was chosen by the user as relevant.

The idea was to see the clusters which the user judged nonrelevant, thereby ascertaining how efficiently the classification clustering process in CHESHIRE brings the relevant clusters (i.e., LCSH and class numbers) together. This process also allowed us to record the bibliographic records as if the user had selected the very first cluster as relevant, which reflects the fact that those would be the kinds of records the user would have retrieved. One of the shortcomings in this process was that we did not know how many clusters the user had seen and decided that it was not worth pursuing his or her search further. In other words, the user may have abandoned the search after seeing only one cluster or all 20 clusters. We simply do not have this information recorded in transaction logs. In fact, some of the search queries suggest that the user, for instance, saw his or her spelling mistake, or wanted to broaden or narrow the query and quickly abandoned the search and re-issued a similar one. We did not classify such queries as cluster failures.

In this section cluster failures that were primarily caused by retrieval of nonrelevant (judged by the user) cluster records were analyzed.

There were eight (7.5%) search queries that were abandoned by the users because the system failed to retrieve relevant clusters (#3, #18, #68, #70, #151, #166, #172, #173). One user issued a search query on "a general history of the Library of Congress" (#151) but did not like the clusters retrieved by the system. In fact, none of the 20 cluster records included the specific LC subject heading Library of Congress in it; they were all general. It appears that CHESHIRE's weighting formula underweighted the most important words ("Library" and "Congress") in the search query. The user re-issued her query as "library of congress" (#152) and retrieved relevant clusters and bibliographic records.

Some users found the retrieved clusters not specific enough and thus selected none as relevant. For instance, one user was looking for collection development in law libraries. He repeated his search query twice ("law libraries -- collection development from 1935" (#68), "collection development law libraries only" (#70)). The most promising two clusters he retrieved in his first search were Collection development -- Libraries and Law libraries. As the user was not satisfied with these somewhat general clusters, he re-issued his query by adding the word "only" after "collection development law libraries." CHESHIRE retrieved, among others, the same two clusters again. Eventually, the user gave up, thinking that there was nothing in the collection that could answer his query.

A similar situation occurred when another user was looking for reference sources in art. Again, the user repeated his query twice ("library reference material on art" (#172), "library resource materials on art" (#173). The former retrieved a few clusters on reference services and reference books. Yet none of them was specific enough to be selected as relevant by the user. The latter was worse: it retrieved nothing whatsoever on either reference sources or reference services. The choice of the term "resource" in the query may have affected the retrieval results negatively because it is not interchangeable with "reference."

Queries on "information policy" (#166) and "cost-effectiveness of library services" (#3) also failed to retrieve relevant clusters. The best cluster CHESHIRE retrieved for the former query was Information services. There was no specific cluster on "information policy." The second one was more specific. None of the clusters retrieved had anything to do with the user's query.

Cluster failures summarized above include those search queries for which CHESHIRE failed to retrieve any relevant clusters. We did not consider such cases as cluster failures where users entered out-of-domain search queries and then, upon seeing unpromising clusters or failing to retrieve anything at all, did not want to continue their searches. For instance, users abandoned 17 out-of-domain search queries without selecting any cluster as relevant.

Similarly, search queries that were abandoned before selecting any clusters as relevant because the users simply wanted to revise their queries and resubmit them were not considered as cluster failures, either. There were 11 such search queries.

CHESHIRE's classification clustering mechanism usually helps users get close by to their subject areas by way of displaying promising subject headings which users are likely to find relevant. However, there appears to be some cases where the classification clustering mechanism did not help users to identify their subject areas.

This is simply what happened in majority of the search queries summarized above. Basically, the user found none of the clusters promising or specific enough and did not select any. However, it is highly likely that had the user selected at least one cluster as relevant CHESHIRE would have retrieved some relevant records. A similar case occurred for a query on "library tours" (#113) for which CHESHIRE picked up some general clusters on libraries and some others on "tours, France"! Again, as the user selected no clusters as relevant the search failed. (Selecting some general clusters as relevant just because there are no specific ones available also may cause failures, especially for very specific queries. For instance, both queries on "indexes for information resources on or in networks like Internet and Bitnet" (#15 and #16) failed to retrieve any relevant bibliographic records in spite of the fact that the user judged some clusters as relevant.)

7.2.1.6 Analysis of Search Failures Caused by the Library of Congress Subject Headings

Subject headings assigned to bibliographic records in the CHESHIRE database were taken from the Library of Congress Subject Headings (LCSH) vocabulary. The terminology used in headings and specificity or exhaustivity of assigned subject headings were determined by LCSH.

LC subject headings assigned to documents caused a total of five (4.7%) search failures in our study (#127, #128, #174, #181, #192). Retrieved clusters (hence assigned subject headings) were fairly broad in all but one cases. LC subject headings presented in those clusters were not specific enough to describe users' search topics. Yet users felt that they were compelled to select broad LC subject headings as relevant in order to retrieve bibliographic records.

Three search queries on censorship of children's literature (#127, #128, #181) failed because LC subject headings provided were not specific enough. CHESHIRE retrieved some general clusters on censorship for the first search query ("censorship of children's books"). Yet the most specific cluster on censorship of children's literature was not displayed. CHESHIRE successfully retrieved two titles relevant to the user's query. Yet both titles were cataloged under general LC subject headings Censorship and Censorship -- United States.

The second query was worded slightly differently ("censorship of children's literature"). The match between the query terms and that of LC subject headings were better for this query. CHESHIRE retrieved three relevant sources that were cataloged under the specific LC subject heading Children's literature -- Censorship. On the other hand, this query missed two relevant records cataloged under the broader LC subject headings given above.

The third search query retrieved only one relevant source on censorship of children's literature. Retrieved sources were mostly on either children's literature or censorship, but not necessarily the combination of the two. The user's selection of broad clusters on censorship as relevant helped very little in terms of CHESHIRE's ability to pinpoint more specific items in the database.

Another search query on "children's book reviewing" (#174) also failed because there was no specific LC subject heading provided. The user was looking for theoretical works on children's book reviewing. The majority of the sources CHESHIRE retrieved were on the history of book reviewing and book reviews and children's literature in general.

The last search query that failed because of the lack of specific LC subject headings was on "relevance" (#192). The user was trying to find sources on relevance feedback in information retrieval systems. None of the sources CHESHIRE retrieved was assigned "relevance" as a specific LC subject heading. Rather, broader LC subject headings of Information storage and retrieval systems -- Testing and Information storage and retrieval systems -- Evaluation were assigned to relevant titles.

The lack of specific LC subject headings appears to have affected the outcome of some other search queries in an unfavorable way, although such searches failed for other reasons. For instance, one user was looking for sources on "letterpress printing" (#29) and there was no specific LC subject heading, which caused the system to retrieve some general clusters on private presses and little presses. One other user was interested in "greek typefaces in Paris in fifteenth and sixteenth centuries" (#180). None of the assigned LC subject headings was that specific.

7.2.1.7 Analysis of Search Failures Caused by CHESHIRE's Stemming Algorithm

The function of a stemming algorithm is to reduce the search terms in the user's query to their root forms so that search terms would match more records in the database, thereby increasing the recall rate. Reducing the search terms to their roots also means that less storage space will be needed to accommodate the dictionary of all the terms occurring in the document database.

Stemming algorithm used to parse the query terms in CHESHIRE caused four (3.7%) search queries to fail completely (#74, #75, #118, #209).

The search query on "C" (#74) retrieved nothing because the stemming algorithm disregarded the term completely. The user revised his query and entered "programming C" (#75). However, revision of the query did not improve the search query very much because the algorithm recognized only the first term and disregarded "C" again. This caused CHESHIRE to retrieve several clusters on programming, but not necessarily C programming. The user abandoned the search upon not finding any relevant clusters.

The other two stemming failures were also similar. The search queries on "r&d" (#118) and "e-journal" (#209) retrieved nothing because the algorithm failed to recognize the abbreviated terms "r&d" and "e". ("r&d" for "research and development", and "e-journal" for "electronic journal".)

There were a few more search queries which the stemming algorithm failed to evaluate properly, although those search queries failed due to some other reasons (e.g., collection failures, user interface failures). A personal author search query for "marcia tuttle" (#90) was reduced to "marc" by the stemming algorithm, which caused the system to pick up several sources on Machine Readable Cataloging (MARC). Similarly, "novell" (#125) (a local area network brand name) was reduced to "novel", which resulted in the retrieval of such clusters as Santa Maria Novella Dominican Monastery, American fiction, and the like. The system would have retrieved bibliographic records on, among others, Victorian novelists if the search was not abandoned.

7.2.1.8 Analysis of Search Failures Caused by No Apparent Reason

Three (2.8%) search queries failed due to no apparent reason (#157, #176, #219). Retrieved records for a search query on "storytelling" (#176) were all relevant. Relevance feedback search results were also relevant. Yet the user judged this search as ineffective. She said she was asked a reference question in her job about storytelling, but she could not remember the details very well.

A search query on the "history of Library of Congress Subject Headings" (#219) was abandoned by the user although retrieved clusters were relevant. During the interview the user did not recall performing this search. Similarly, a search query on the "history of printing" (#157) was also abandoned by the user even though the system retrieved some excellent clusters.

It is difficult to classify these three searches under a certain category of search failure. Clearly, users had some difficulty recalling their search queries. However, none of the queries was judged ineffective or abandoned because of system problems.

7.2.1.9 Analysis of Search Failures Caused by Specific Queries

Although users submitted several specific search queries to the system, only two (1.9%) search queries failed primarily due to the specificity of search queries. In fact, some search queries submitted to the system were formulated as "research questions" rather than online catalog search requests. For instance, one user was trying to find some sources to support his thesis that "law librarianship is a product of the 1929 stock market crash." He was also interested in if "the federal depository legislation of the early '30s. . . had a major impact on law libraries." His search query was relatively broad ("history of law libraries, history of federal depositories, personal narratives of law librarians, law libraries") (#7). Yet when the system retrieved some general sources on law libraries, he selected none of them as relevant. He was after specific sources that could prove his thesis. Such sources simply did not exist in the database. It is likely that user's query can be answered only after an extensive study of the literature. Yet he was expecting to find specific titles referring directly to his research question.

Another user was looking for information "on the public image of librarians through history" (#162). The system retrieved, among others, two general titles on her topic. Yet none of the items were as specific as the user would have liked. Hence she selected none as relevant.

7.2.1.10 Analysis of Search Failures Caused by Imprecise Cluster Selection

Two (1.9%) search queries failed due to somewhat imprecise cluster selection by users (#11, #195). A search query on "electronic mail" (#11) retrieved a very promising cluster on "electronic mail systems." Yet the search was abandoned by the user. In the second case the user was interested in reference sources on art and she entered her query simply as "art" (195). Yet she selected some general clusters on reference services in libraries as relevant. Based on the user's cluster selection, the system expanded the original search query in that direction and retrieved some sources on reference services rather than reference sources on art. The user said she did not remember finding anything useful.

7.2.1.11 Search Failures Caused by Telecommunication Problems

One of the users experienced telecommunication problems when she got access to CHESHIRE, which caused her to abandon two search queries (1.9%) in the middle of the sessions (#17, #111). In both cases she managed to establish connection immediately afterwards and carried out her searches. The exact cause of why she was disconnected is not known. Yet several users got access to the system and experienced no telecommunication problems.

7.2.1.12 Analysis of Failures Caused by Users' Unfamiliarity with the Scope of the CHESHIRE Database

One of the users was unaware of the scope of the database and looking for periodical literature on collection development and acquisition practices in law libraries. He carried out two searches (1.9%) and found both of them unsuccessful. He thought the database contained bibliographic records of articles published in law library journals. Yet the database contains no references to periodical literature; bibliographic records in it represent the monographic holdings of the LSL collection. The user suggested that our presentation of the system as a "third generation" online catalog during the classroom demonstrations led him to believe that the database also indexed periodical literature. He maintained that he was "relying entirely too much on CHESHIRE to come up with the definitive answer."

7.2.1.13 Analysis of Search Failure Caused by False Drops

The primary cause of one of the search failures (0.9%) was false drops that occurred during the retrieval. The user was looking for sources on "library tours" (#113) that are given to users as a part of the bibliographic instruction or library orientation program. The top cluster the system retrieved included the following LC subject headings: Plentin, Cristophe -- ca. 1520-1589, Printing -- France -- History, Printing -- France -- Touraine -- History. The rest of the clusters retrieved were general.

Apparently, the retrieval algorithm attached more weight to the term "tours" in the user's query than the term "library" (for "library" is the most frequently occurring term in the database). Also, sources on library tours generally bear different title words (e.g., library orientation, bibliographic instruction). Furthermore, the user's query matched none of the LC subject headings in the database completely.

False drops occurred in a few other search queries as well. Yet they affected the outcome very little as they were presented through the end of the retrieval list. For instance, a search query on "cd-rom databases" (#10) retrieved several bibliographic records on CD-ROMs. Yet it also retrieved six records that had nothing to do with CD-ROMs such as The early editions of the Roman de la Rose and Operai tipografi a Roma, 1870-1970. Fortunately, all six titles ranked lower than the relevant titles on CD-ROMs and were displayed at the end of the list. The reason for why the system picked up such titles was that "CD-ROM" was treated as two separate words, "CD" and "ROM." Thus, after all the records in the database on CD-ROMs were exhausted, the system retrieved the next best matching records. (Apparently, the stemming algorithm reduced "Roma" to "rom.")

Similarly, the search term "quit" (#94), which was intended to be a quit command but entered in the query description screen, retrieved two clusters on the history of printing in Ecuador because the title of one of the books happened to include the word Quito! A search query on "Dr. Seuss" (#57) retrieved several items with "Dr." in their titles! One other query on CHESHIRE system (#191) retrieved a cluster with LC subject heading Libraries -- England -- Cheshire -- Directories.

7.2.1.14 Analysis of Search Failure Caused by Call Number Search

After displaying cluster records, one of the users thought the call number as another access point and entered "1. Call Number Z00699," apparently trying to retrieve all the items in that call number range. The search failed because the system has no call number searching facility.

7.2.2 Analysis of Zero Retrievals

In the previous section we analyzed the causes of search failures and referred to zero retrievals from time to time in the context of collection failures, misspellings, and so on. In this section we will briefly look at zero retrievals that occurred in CHESHIRE separately. Note that we do not categorize zero retrievals as a separate factor causing search failures, for we have already analyzed some search queries that retrieved no records in the previous section.

A total of 18 search queries retrieved nothing in CHESHIRE. The causes of these zero retrievals were presented in Table 7.2.

Table 7.2 Causes of Zero Retrievals (N=18)

Causes of zero retrievals

N

%

Collection failure

4

22.2

Out-of-domain search query

4

22.2

Stemming algorithm

2

11.1

Personal author search

2

11.1

Call number search

1

5.6

Misspelling

1

5.6

Help request

1

5.6

Quit

1

5.6

Incomplete search query

1

5.6

Gibberish

1

5.6

TOTAL

18

100.2

Note: Percentage totals do not always equal to 100% due to rounding.

As can be seen from table 7.2, of those 18 search queries, four retrieved no clusters due to collection failures ("z39.50," "hypermedia," "hypertext," and "minitel"). Four search queries retrieved nothing because they were out-of-domain ("nanotechnology," "syrian asceticism," "asceticism in syria," and "blood transfusion"). The stemming algorithm was the cause of two search failures ("C" and "r&d") which retrieved no records. Two search queries failed to retrieve any records because the user attempted to perform a personal author search ("tuttle" and "katz"). One search query retrieved nothing because the user attempted to perform a call number search on CHESHIRE ("1. Call Number Z00699"). Another query failed because it was incomplete ("the"). One search query failed to retrieve any records due to misspelling ("vctorian"). An indecipherable query ("ljkdsf g") retrieved nothing, either. In one case the user needed help ("how do I use this systenm [sic]"); one other user entered "Bquit [sic]," both in the query description screen. Both queries retrieved nothing due to misspellings.

We examined if the users who got zero retrieval results pursued their searches further by issuing new searches. The user who was looking for sources on "Z39.50" issued a broader search query on "user interface studies." The user who tried "hypermedia" and "hypertext" issued a new search query on a completely different topic. The user who issued search queries "nanotechnology" and "C" was browsing to see if there was anything on these topics in the database. He also was testing CHESHIRE's user interface. When his search query on "C" failed, he renewed his query as "programming C." The user who misspelled her query as "vctorian" renewed her query with the correct spelling. The user who attempted to perform a call number search understood the limitations of the system and issued a topical search next.

The user who performed out-of-domain searches on "asceticism in syria" abandoned his search after two attempts. He seemed to have been unaware of the database limitations. The user who performed personal author searches ("tuttle" and "katz") decided to quit after one more attempt ("katz reference"). The users who were looking for sources on "minitel" and "r&d" stopped using the system afterwards. So did the user who was searching for sources on "blood transfusion." The user who requested help ("how do I use this systenm") stopped searching after entering an incomplete query ("the").

7.2.3 Discussion on Search Failures

Our analysis shows that almost 40% (or 42 search queries) of search failures that occurred in CHESHIRE were mainly due to collection failures. An additional 10% (or 11 queries) of the search queries failed because users attempted to perform known-item searches (author or title searches). Two search queries failed due to user's unawareness of the scope of the CHESHIRE database (i.e., periodical articles are not indexed in the database). Two more search queries failed due to telecommunication problems. One user attempted to perform a call number search which is not supported by the system. These figures suggest that more than half the search failures (58 out of 107) were caused by factors that were outside the control of the CHESHIRE system. As the number and variety of search queries increase collection failures become inevitable no matter how large the size of the database. Furthermore, some 14 search queries failed because users were not well-informed about the limitations of the CHESHIRE system (e.g., lack of known-item and call number search features), despite our efforts of providing demonstrations and documentation.

The rest of the search failures were primarily due to user interface problems (13 queries), search statement (11 queries), cluster failures (8 queries), LCSH (5 queries) and stemming algorithm (4 queries). Specific queries, imprecise cluster selection and false drops also caused a total of 5 search failures.

Several users complained that they had experienced a multitude of difficulties with the user interface. Interviews with users indicate that CHESHIRE's user interface might have affected the outcome of several search queries indirectly even though only 13 search queries failed primarily due to interface problems.

When one of the users observed that that the user interface "looked very much like something invented for an experimental catalog," she was obviously referring to the limited help features available in CHESHIRE. Another user described the interface as "inattentive" when help was not available. Some users thought the interface was hard to understand intuitively. The experience was simply frustrating for some others.

Others compared CHESHIRE's user interface with that of second generation online catalogs. They said they feel more comfortable with, and in control of, the process of searching in traditional online catalogs where Boolean operators AND, OR, and NOT are available. Some thought the user interface was inflexible because it was menu-driven and they had to "plough through [records] screen by screen."

More often than not users issued detailed, descriptive, and yet specific, search queries, which sometimes resulted in failures. It appears that the expectations of users from a system which accepts natural language queries were high. Several users seem to have assumed that CHESHIRE is able to "understand" their search queries completely and retrieve the relevant records.

This assumption has led to poor retrieval in CHESHIRE in some cases because it has no natural language understanding capabilities. As explained earlier, all the system does is it "parses" the search statement and determines the retrieval-worthy search terms in the query. It then matches the query terms with those in the database and brings back the results using probabilistic retrieval algorithms. In fact, there were some queries where the system attributed undue weight to some search terms that should not have been taken into account at all. For instance, the term "books" in the search statement "some books on history of libraries and classification" (#38) is useless for retrieval purposes. Yet it was taken into account by the retrieval algorithm, which cluttered the search results. Similarly, the search request "find all library literature concerning the history and publication of the Federal Register" (#95) contains two words ("library" and "literature") that were useless for retrieval purposes. There were other such examples ("want to find a small set of books on historical treatment of mathematics" (#147), "I want information on the public image of librarians through history" (#162), and "subject search japanese novelists"(#182)).

In some cases users added qualifiers to their search queries, presumably thinking that the system would be able to figure out from their search statements what they exactly wanted. For instance, period qualifiers were introduced in the following examples: 1) "I want books about letterpress printing published after 1950" (#29); 2) "law libraries -- collection development from 1935" (#68); and 3) "banned books after 1980" (#83). Language and publication form qualifiers were also used in some queries ("I'd like to see recent books, in english, about library automation" (#99), "periodical literature on the development of law library collections" (#69)). These queries can be handled with the Boolean operator AND in second generation online catalogs (e.g., FIND SUBJECT LIBRARY AUTOMATION AND LAN ENGLISH). One user was looking for sources on collection development in law libraries only (#70), which requires a Boolean NOT operator in second generation online catalogs.

None of the above conditions can be satisfied by CHESHIRE since, as pointed out earlier, it has no natural language understanding capabilities. The system cannot distinguish records by date, language and form. Nor can it deal with Boolean operators. It is interesting to note that users carried over some of their previous search experience from other online catalogs to CHESHIRE.

Users issued more complicated search queries which neither second- nor third generation online catalogs can satisfy. Examples are as follows: 1) "alternatives to traditional subject headings" (#223); 2) "how many books by patrick wilson does the library have?" (#97); 3) "projected salaries for special and academic librarians on the west coast" (#190); and 4) "looking for a humorous book about librarianship with cartoons" (#105).

These examples illustrate some very interesting points. Clearly, those search statements were difficult to parse and they all require natural language understanding capabilities. The first two examples were already discussed earlier. The third user was expecting the system not only to interpret her query as "projected salaries for special and academic librarians on the West Coast of the United States but also to determine which states constitute the West Coast (e.g., California, Washington) and thus to expand her query by adding the state (or even city) names automatically. Parsing this query requires not only some sound natural language understanding capabilities but also an extensive system vocabulary to convert (or expand) the user-supplied query terms to system's vocabulary.

The fourth example also exhibits similar difficulties. In addition, the question of how one would describe a humorous book and whether such a book would be labeled in its title as a "humorous book" remains to be answered. Without such labels (or "handles") it is difficult to imagine how online catalogs could possibly retrieve records. As far as LC subject headings are concerned, some of the relevant titles (e.g., Bibliologia comica, Bizarre books) were cataloged under such headings as Library science -- Humor, Literary curiosia, Bibliography -- Miscellanea, Bibliography -- Anecdotes, facetiae, satire, etc.

Examples given above also show us how specific the users' queries can get when they are not bound with Boolean operators. It should be stressed that such specific queries would most likely fail in Boolean online catalogs. Whether the ability to submit search queries to CHESHIRE in natural language form encouraged users to be more specific is open to conjecture. Subject search statements in online catalogs that require Boolean set construction tend to be shorter whereas several search queries submitted to CHESHIRE contained more than five searchable terms (maximum was 24). (In this study the average number of searchable terms in search queries was 3.5 (see Chapter VI).)

Classification clustering process caused some false drops, examples of which were given earlier. CHESHIRE retrieves and ranks the records, generally speaking, on the basis of how closely they match users' search terms. If there are some items in the database that fully match the user's query terms, then such items are listed at the top. If, however, there are not that many items that either fully or partially match the user's query terms, the system lists the best matches at the top and then lists the partial matches. It is those partial matches that confused the users most.

It was confusing when CHESHIRE's classification clustering mechanism failed to retrieve the most promising clusters, and bibliographic records, at the top of the list. When the system came up with nonrelevant records users got curious why CHESHIRE retrieved what it retrieved. For instance, one of the users was interested in library book boycott against South Africa and she entered her query as "cultural boycott of south africa" (#62). There were no relevant items in the database on this topic (e.g., collection failure). The system typically evaluated the search terms and retrieved some items. But because there were no records that fully matched user's query terms (i.e., cultural, boycott, south, africa), it came up with the next best matches such as Morphotaxonomic studies of the South African representatives of the genus Codicum (Chlorophycophyta) and Research materials in South Carolina. It is not too difficult to see that CHESHIRE retrieved those two records, among others, because they happened to contain some of the terms in the user's query ("south africa" and "south," respectively). The user said she "could not figure out what the system was doing."

In addition to the ones summarized earlier (e.g., "dr. seuss" (#57), "cd-rom databases" (#10)), several examples of such partial matches can be given. For instance, a query on "libraries in mexico" (#47) also retrieved items on libraries in New Mexico. A query on "berkeley library school history" (#140) came up with titles on the history of the University of California Berkeley Library. (Note the incorrect term relationships in the retrieved items for this query.) A query on "computer conferencing" (#34) retrieved general sources on computers because the term "conferencing" was not recognized. Similarly, the query "programming C" (#75) brought back general items on programming but not necessarily C programming. One user was surprised to see that her search query on "history of printing in Paris" (#81) included titles "not really connected with printing in Paris." Another user was playing with the system and he wanted to see what the system would do with a query like "please find books on children, basball [sic], and animals" (#30). He said that he should not have retrieved anything. Yet he indicated that he would prefer retrieving some records, even if they do not make much sense, rather than retrieving nothing.

Several users found relevance feedback search hard to understand and confusing. Some did not know what to do with relevance feedback search while others indicated that "there is no indication of the point at which you should stop performing the relevance feedback." A few users found the relevance feedback feature in CHESHIRE not very helpful in some circumstances. For instance, one of the users indicated that "a system which will always attempt to give the user something is a system with a problem. The system has to be smart enough to know and inform the user that there is no good information." He added that CHESHIRE doesn't.

As summarized above, some retrieval results puzzled the users. They became curious and wanted to know "how CHESHIRE retrieves what it retrieves." The following excerpts from the interview scripts illustrate, to some extent, their uneasiness:

I would like to learn more about CHESHIRE and how it does what it does.

I couldn't figure out what it [CHESHIRE] was doing or why; it seems strange...

I don't quite get CHESHIRE. I don't quite get what it's doing and I don't quite know what to do with it.

Personally I would have thought it helpful to understand it a little bit better why it was retrieving what it was retrieving. It wasn't always clear to me why something had come up. . . Personally I find I can use a system better if I have some sense of why it does what it does.

People don't want to interface with optimized retrieval algorithms and data structures. They want something they can work with.

It appears that some users feel less confident with their searching skills when they cannot figure out how the system interprets their commands. Furthermore, the outcome of such ineffective search results may cultivate "distrust" between the system and its users. As Buchanan (1992) pointed out, users may have very little patience when the system presents bibliographic records that should not have been retrieved in the first place.

The number of users who experienced such problems when they performed searches on CHESHIRE was relatively low, however. Several users found the system very helpful and effective. The next section concentrates on retrieval performance in CHESHIRE in terms of success. It examines the search effectiveness in CHESHIRE and summarizes the strengths of the system based on the retrieval results and users' assessments of the system.

7.2.4 Search Effectiveness in CHESHIRE

Many users participated in the experiment expressed their opinions of CHESHIRE with the following words:

I enjoyed searching CHESHIRE. It was fun.

I think this kind of system is a great idea.

I think it's a marvelous idea.

. . .refreshingly useful. . .intuitively easy to learn to use.

I guess making things a little bit more user-friendly in terms of what was happening in the program would have made things easier to use. But even as it was I found it [CHESHIRE] more effective than other online systems on campus.

It was just great.

I enjoyed using it, actually.

It is really intriguing stuff. . .This type of activity is I think clearly what patrons are looking for. . .People are going to get used to searching in this particular way.

Although an overwhelming majority of participating users was not familiar with probabilistic online catalogs, they quickly became proficient, as the quotes from the interview scripts show, in searching CHESHIRE once they figured out how the system works. In fact, several users compared CHESHIRE to second generation online catalogs with Boolean searching capabilities and said they would prefer CHESHIRE-like online catalogs.

The analysis of retrieval results shows that CHESHIRE's performance was well above the average. If the search failures caused by collection failures and the user interface are to be excluded from the analysis, it becomes clear that search effectiveness in CHESHIRE was much higher than many second generation online catalogs. Take, for instance, the zero retrieval rate in CHESHIRE. Eighteen queries failed to retrieve any records in CHESHIRE, which constitutes 7.9% (18/228) of all search queries submitted. Compared with much higher zero retrieval rates in first- and second generation online catalogs, this low percentage represents a remarkable achievement for CHESHIRE. For instance, Markey (1984) found that percentages of zero retrievals in subject searching range from a low of 35% to a high of 57.5%. Similar findings have been reported in several other online catalog studies (e.g., Larson, 1986; Peters, 1989; Hunter, 1991).

The enormous difference between the zero retrieval rates in CHESHIRE and other online catalogs may be due to a number of factors. First, classification clustering mechanism in CHESHIRE seems to decrease the number of zero retrievals tremendously, for CHESHIRE automatically checks both titles and subject headings of the documents in the database for possible matches during the classification clustering process. If a match is found either in titles or subject headings (or both), CHESHIRE retrieves the clusters and, subsequently, bibliographic records. In other words, a search query in CHESHIRE only fails when neither the title words nor subject headings match the user's query term(s).

Second, stemming algorithm used in CHESHIRE might have helped decrease the number of zero retrievals. In second generation online catalogs the same effect can be achieved by truncating search terms. Yet, unlike in CHESHIRE where search terms are reduced to their roots automatically, the user has to initiate the truncation action. As we have seen earlier, stemming algorithm in CHESHIRE caused false drops in rare occasions (e.g., "novell," "cheshire," "marcia tuttle"). Yet such false drops occur in second generation online catalogs more frequently.

The zero retrieval rate in CHESHIRE could be even lower with the availability of a spell-checker. Scanning search queries for misspelled or mistyped words and informing the users about potential errors before the retrieval would have prevented some zero retrievals before they occurred (see, for instance, "vctorian," "Bquit," "systenm," "ljkdsf q").

In addition to relatively low zero retrieval rate, the number of search failures that were caused by vocabulary mismatch in CHESHIRE were also fewer. That's to say, users were able to match their search queries with the system's vocabulary (i.e., titles and Library of Congress subject headings assigned to bibliographic records). Only five search queries (out of 228) failed due to mismatch between the user's vocabulary and that of CHESHIRE (2.2%) and lack of specific LC subject headings. However, it is not appropriate to compare this figure with those obtained in second generation online catalogs, which consistently showed that users' search terms exactly match the subject headings only about half the time (Carlyle, 1989, p.37; Van Pulis & Ludy, pp.528-529; Vizine-Goetz & Markey Drabenstott, 1991, p.157).

The reason why users were able to match their search statements with CHESHIRE's vocabulary, which also is one of the reasons why the figures cited above are not comparable, is the availability of classification clustering process in CHESHIRE. As mentioned earlier, classification clustering method used in CHESHIRE is the first step in the retrieval process. The user's query is processed first to determine if the query terms match titles or subject headings of the items in the database. CHESHIRE then retrieves and ranks cluster records on the basis of the degree of match between the query terms and the titles and subject headings and displays them to the user. Each cluster record display has the classification number under which most or all bibliographic records are listed, the broad topic (description of which is taken from the LC classification scheme) of the books in the cluster, and the most often assigned three LC subject headings for the books in that particular cluster. The user can then select one or more clusters as relevant, primarily by checking the most frequently assigned LC subject headings. This information will then be used to expand the user's original search query.

Larson (1991a, p.158) suggests that most often "[t]he information in the cluster display usually provides a good indication of the general topics of books under a particular classification number." Furthermore, the utilization of both title words and subject headings during the classification clustering process also increases the users' chances of matching their terminology with that of the system. The display of LC subject headings in the cluster record seems to facilitate the matching process as users are better at recognizing relevant search terms than remembering them.

Classification clustering technique used in CHESHIRE evidently helped decrease both the number of zero retrievals and number of search failures caused by vocabulary mismatch. CHESHIRE's classification clustering process worked remarkably well for especially specific search queries. Despite the fact that there were several specific queries and that the database did not contain many records that could answer such queries, CHESHIRE usually managed to retrieve the relevant ones. It successfully retrieved clusters from different parts of the classification scheme, thereby providing the user an opportunity to view his or her query in different contexts. For instance, one of the users was "interested in works that either were directly in the interdisciplinary area of knowledge utilization or that were tangential to the area of knowledge utilization". He submitted his query as "knowledge utilization" (#119). CHESHIRE's classification clustering mechanism did an excellent job of pulling together several clusters from different parts of the LC classification schedule: theory of knowledge (BD161), communication (P91), sociology of knowledge (BD175), social science research (H62), and classification of sciences (BD241). Subsequently, the system retrieved several sources on knowledge creation, production, and utilization, which the user was "satisfied to see that they came up." Another user was trying to find out classification sections pertaining to "graphic display of thesauri in electronic format" and CHESHIRE's classification clustering process successfully pulled out records from different areas of the LC classification schedule (Z695, Z699, TK7882). He found out that "there is a section in TK. . .that deals specifically with visual display on computers."

Classification clustering technique helped provide more specific LC subject headings as part of the cluster records for specific search queries. It brings together several records from different parts of the LC classification schedules, which enables the user to retrieve relevant records that are cataloged under slightly different but nonetheless related LC subject headings. For instance, one of the users was interested in "library services for ethnic minorities" (#217). We performed several searches in order to determine the recall base for this search query, and found that the most commonly assigned LC subject heading (Library services to minorities) to such books retrieved less than half of the relevant records in the database. There were several unique relevant records that were indexed under 16 different LC subject headings! CHESHIRE successfully collocated most of those records cataloged under different LC subject headings by expanding the user's query on the basis of cluster selection and relevance judgments and retrieved them.

This is one of the reasons why search failures due to vocabulary mismatch occurred much less frequently in CHESHIRE than in second generation online catalogs. For, one cannot expect an ordinary end-user to come up with all the possible LC subject headings under which sources on library services to ethnic minorities are indexed. The user would have missed all the records cataloged under, inter alia, Minorities -- Information services, Libraries -- Services to Hispanic Americans, Mexican Americans and libraries, Library services to Chicanos. Furthermore, a user "looking for a humorous book on librarianship with cartoons" (#211) would be hard-pressed to remember the LC subject heading Libraries -- Anecdotes, facetiae, satire, etc., under which many such books were cataloged.

It is no exaggeration, then, to suggest that many specific queries submitted to CHESHIRE would have produced zero results in second generation online catalogs with Boolean search capabilities. The availability of automatic query expansion in CHESHIRE, which is based on feedback from the user by means of classification clustering and relevance feedback techniques, helps alleviate the search failures that might have otherwise occurred.

One of the features that is available in CHESHIRE that some users found especially useful was the relevance feedback search capability. As explicated in Chapter II, relevance feedback process enables users to refine their search queries by making relevance judgments on the retrieved records. The system then incorporates this relevance information and retrieves more records that are similar to the ones that the user already judged as being relevant. Users tried relevance feedback option of CHESHIRE for 91 search queries in this study. Relevance feedback usually improved the results by retrieving more relevant records from the database (see Chapter VI).

Users, in general, seemed to have liked CHESHIRE's relevance feedback search capability, although some users admitted that they were "overwhelmed by it." One of the users commented that relevance feedback search "seemed to get her what she wanted." Another user shared the same view when he said relevance feedback search results "get more specific into exactly what he wanted." Yet another user remembered relevance feedback as "being a very nice feature." However, several users found the concept of relevance feedback search hard to understand and confusing.

One of the features of CHESHIRE that users especially liked is being able to describe their search queries in natural language. They thought that entering search statements in natural language without worrying about syntactic rules and Boolean operators was most helpful. The availability of natural language interface seem to have improved users' search statements and made the queries more descriptive.

To conclude, then, that some of the advanced information retrieval techniques that are available in CHESHIRE help decrease the search failures in online catalogs while at the same time increase the search success. Classification clustering and relevance feedback techniques tremendously improve retrieval results. Users can enter very specific search queries using the natural language and yet still retrieve some relevant records because of the availability of classification clustering and relevance feedback techniques. Furthermore, zero retrieval rates and failures caused by vocabulary mismatch are much lower than second generation online catalogs.

7.3 Summary

In this chapter the causes of search failures that occurred in CHESHIRE were analyzed qualitatively. Types of search failures (e.g., collection failures, failures due to user interface problems, cluster failures) were classified and several examples were given in each category. The likely causes of search failures were examined from the analyses of transaction logs, questionnaires, and structured interview scripts. Then, search effectiveness was examined. The strengths of CHESHIRE such as the availability of classification clustering and relevance feedback techniques and its success in decreasing search failures were discussed and the findings were recapitulated.

We found that collection and user interface failures constituted more than half of all the search failures that occurred during the experiment. This was followed by failures that occurred due to, among others, faulty search statements and known-item search queries (which were not supported by the system). To put it differently, well over half the search failures were caused by factors that were outside the control of the retrieval system. On the other hand, failures due to zero retrievals and vocabulary mismatch occurred much less frequently in CHESHIRE than in second generation online catalogs. Similarly, despite the fact that users submitted detailed yet specific search queries in many cases, the system still managed to retrieve some relevant records. This is due, in part, to the fact that probabilistic systems attempt to match the user's search terms both with titles and subject headings of the items in the database. In addition, we also found that users tend to submit longer search statements (with more search terms) to probabilistic online catalogs with natural language interfaces than they would submit to second generation online catalogs with command language user interfaces. For they are not constrained with the syntax rules of the command languages and can describe their information needs with more words.

Nonetheless, parsing natural language queries proved to be difficult because some search terms were useless for retrieval purposes but nevertheless matched records in the database. In addition, some search queries contained Boolean operators as well as language, date and form qualifiers, which suggests that users carried over some of the expertise that they gained using second generation online catalogs. Although the number of such cases was not high, it is likely that such mismatches will persist as the size of the database grows and the collection make-up becomes multi-disciplinary. That's to say, lack of natural language understanding capabilities in user interfaces will continue to cause search failures in online catalogs.

The classification clustering and relevance feedback techniques that are available in CHESHIRE appear to have played significant roles in decreasing search failures. The classification clustering technique provides users an opportunity to expand their search queries by selecting some cluster records thereby increasing their chances of retrieving relevant documents. Similarly, users' relevance judgments on retrieved records are used to automatically expand the original search query so that documents that are "similar" to the ones that were already judged as being relevant can be retrieved from the database.

However, the way the classification clustering technique has been implemented in the system prevented a few users from continuing their searches. In order to continue their searches, users have to select at least one cluster record as relevant. Yet some users were unaware of this and their searches ended prematurely due to not selecting any clusters.

Users should be able to continue their searches even if they select no clusters as being relevant. One can think of two solutions to this problem. The user can simply be asked to select at least one cluster as relevant, presumably the most promising one. This is a rather crude and simplistic solution. Besides, there may be some cases where none of the clusters would seem relevant. The second, and more elegant, solution would be to execute the query without the classification clustering mechanism, rather than forcing the user to choose at least one cluster as relevant (when there is none) against his or her will. If the user does not like any cluster, the system would go ahead and execute the query based on simple frequency distributions of query term(s) that are contained in titles and subject headings.

This may require some changes in the way the system works. At present, the classification clustering mechanism as implemented in the system gets its input from the user: whenever the user chooses one or more clusters as relevant, the system goes back and promotes those records which were listed under the selected clusters. If no clusters are chosen, however, the search ends there. The implicit assumption here is that if the system is unable to bring back possibly relevant clusters, it is highly unlikely that the collection has anything useful for that particular user and search query. This assumption may well hold for most, if not all, users and search queries. Nonetheless there would still be a merit not to end the search there, in spite of the fact that the user chose no clusters. The system could go ahead and execute the query by "bypassing" the classification clustering step.

Such an improvement would benefit some users. First, it could be that some users may find the individual records relevant even if they did not like the clusters. This would cut down the number of searches that abruptly end due to not selecting any clusters. Second, some users may not be aware of the fact that they must choose at least one cluster as being relevant in order to be able to retrieve some individual records. There is some evidence that some users did expect to get to individual records without selecting any clusters. In fact, CHESHIRE's user interface gives no clues to the users that they "have to" select clusters. Third, some users come to the system just to test it and see how it works (so called "tourists"). They do not necessarily want to follow the instructions. Rather they want to explore the system. When reminded during the interviews that they probably did not like the clusters they had seen, several users stated that they were "just exploring." Those students who want to explore the system without selecting any clusters would never get to individual records.

It can be argued that providing access to individual records by bypassing the classification clustering step would mean that the users would not be able to use CHESHIRE to its full strength. This is certainly true. Larson's research indicates that classification clustering mechanism helps users match their vocabulary with that of the system. This, in turn, improves the quality of the searches. Nevertheless it should still be possible to retrieve records based on simple frequency distribution counts. At present, the classification clustering algorithm is not closely tied with the retrieval process. That is to say, the system checks the cluster "centroids" only after the user selects some clusters so that the records in promising clusters be considered more important for retrieval than the others. Bypassing classification clustering would mean that the system need only evaluate the bibliographic records containing the query terms. This, according to Larson, will actually decrease the overall processing needed for each query as there will be fewer terms to consider for retrieval purposes.

The relevance feedback technique helped retrieve more relevant sources from the database, yet the search results tended to deteriorate quickly after the second relevance feedback iteration. It appears that the user-entered search query deviates from the original form with the addition of too many nonrelevant terms during the relevance feedback cycles. Similar findings have also been reported in other probabilistic online catalogs (e.g., Okapi) with relevance feedback search techniques.

Some of the advanced retrieval techniques that constitute the strengths of the CHESHIRE system confused some users. For instance, most users liked the natural language interface and found the relevance feedback feature useful. Yet some users were bewildered with the availability of the very same techniques as they apparently never used a probabilistic online catalog with classification clustering and relevance feedback techniques. A few users indicated that they would prefer to use a Boolean command language to interact with the system rather than a natural language user interface.

Go to Next Chapter

Go to Bibliography