II. Gateway User Study: Multi-Database Search and Common User Interface
There is great interest in library communities in designing and implementing digital library systems that conceal the complexities of an information landscape characterized by numerous, disparate information resources. Users often encounter frustration in their efforts to discover relevant sources, negotiate connections, learn resource-specific user interfaces, and search using a variety of inconsistent query languages and semantic conventions. This work is typically done in an isolated section of a much larger sphere of information, and often users are left with the feeling that they have overlooked some important items.
Although librarians, computer scientists, and other professionals are currently conceiving of and designing systems to alleviate some of this complexity for the user, they must remain sensitive to the reality that even the best ideas may inadvertently inhibit, rather than enhance, the user's experience. To create effective digital libraries, an understanding of users' actual work experiences and behavioral tendencies should form the basis of digital library designs. Increasingly, user studies are being conducted as part of digital library projects (Peterson, 1995, Lloyd, 1996). Most recent studies have taken a user-centric, iterative approach in which the major focus is on how users do their work, not just how they respond to a particular interface design or system feature (Van House, 1996).
Consistent with this user-centric approach, Cornell University's Albert R. Mann Library conducted a comprehensive user study to inform the design of the next generation "Gateway" <http://www.mannlib.cornell.edu>, an electronic library system that provides access to over 600 information resources, including bibliographic indexes and catalogs, full-text documents, statistical datasets, and spatial data. Although the Mann study explored a wide range of issues relevant to digital libraries in general -- including discovery of relevant resources, searching, navigation, and general comprehension of the information landscape, -- the current article focuses on a sub-section of the study designed to explore user perspectives on a common user interface for searching bibliographic resources and the ability to search multiple resources simultaneously. The goals of the larger Gateway User Study, its methodology, the findings, and implications are discussed elsewhere (Payette and Rieger, 1997).
1. Purpose of the Study
Mann Library considered using the Z39.50 search protocol in the Gateway to enable a common user interface for bibliographic searching, and to give users the option to search several databases simultaneously. A component of the larger user study was devoted to validating Mann Library staff's assumptions on the effective use of Z39.50 for these purposes. The user's perspective is critical for identifying barriers to effective implementation and for setting criteria for evaluating existing Z39.50 clients and servers. The knowledge we obtained from our users in this study will influence our decision to acquire an existing Z39.50 client or develop our own. Also, user feedback will dictate the extent to which we enhance the Gateway to support simultaneous searching of multiple databases.
Nearly twenty-one percent of the Gateway resources are bibliographic databases, such as Agricola, BIOSIS, Medline, Periodical Abstracts, PsycInfo, and Carl. These citation databases contain secondary information and provide users with references to journal articles, book chapters and conference presentations. They contain core information for each citation (such as journal title, article title, year of publication, volume number, article abstract) to assist users in identifying and evaluating primary materials. The Gateway also includes bibliographic utilities including the Cornell University Online Catalog, RLIN and OCLC. Each resource has its own distinct search interfaces such as NOTIS, BRS, Dialog, Carl, RLIN Eureka, OCLC, and Cambridge.
With the rapidly increasing number of bibliographic databases in the Gateway, the Mann Library reference desk staff witnessed users' frustration in dealing with different search interfaces and syntax conventions. Users were unable to carry search statements from one database to another. Except for several InfoShare databases that utilize the NOTIS interface, the other Gateway bibliographic databases do not contain hooks to the Cornell University Online Catalog. Users can benefit from these links since they can eliminate the additional search step of checking the availability of a desired publication in one of the Cornell Libraries.
The Gateway User Study employed survey research methods including focus groups with Mann Library staff and interviews with a selected group of faculty and students. Using non-probability techniques (purposive and quota sampling methods), we recruited twenty-seven faculty, and sixteen undergraduate and graduate students who currently use the Gateway. Two survey instruments were used in the study: a questionnaire and an interview schedule. We administered the questionnaire to collect profile data on participants prior to meeting them for interviews.
The faculty and student interviews were semi-structured and organized into three sections. The first two sections were designed to gain an understanding of the users' conceptions of digital libraries, the Gateway, and its navigation tools. The final section of the study focused on the desirability of a common user interface and multi-database search. Our intent was to explore users' opinions on implementing bibliographic file features such as cumulative search, common user interface, and links from bibliographic databases to the Cornell University Online Catalog.
3. Interview Questions
The final section of the interview explored solutions to the above highlighted problems. We introduced interviewees to the concepts of multi-database searching and common interface through discussions and demonstrations using prototype interfaces developed to demonstrate a Z39.50-based bibliographic search (see below, Figures 1 and 2). We provided the interviewees with some search examples to help them to understand how a multi-database search could be conducted. Throughout this exercise, we solicited users' comments and reactions to several open ended questions including:
1. Users' Perceived Benefits of Common User Interface
As anticipated, the interviewees, especially the students, were very supportive of incorporating a common interface for searching bibliographic databases. 89% of the faculty and 100% of the students expressed the view that a common user interface would significantly improve their bibliographic database search experiences. Some of the benefits the interviewees perceived were:
2. Comparison of Common and Database-Specific Interfaces
Users identified a common user interface that can function across multiple databases as preferable to customized, database-specific interfaces. A majority of the faculty and students interviewed favored a common interface. Only two of the interviewees preferred database specific interfaces that enabled them to fine tune their searches. These faculty members, both biologists, identified themselves as frequent users of the BIOSIS database, which indexes and abstracts journal articles in the area of biology. They said that they often refine their searches by using BIOSIS concept codes as they had very specific research areas.
Only four out of the forty-two interviewees (10%) said that they needed both database-specific and common interfaces. These users thought a common interface for searching multiple databases would be valuable for identifying those databases that were relevant to their specific search topic. Once they found the best sources, they would search databases individually to take advantage of database-specific search features. For example, one interviewee said that he might conduct a multi-database search to find which databases have the greatest number of citations on public health policy before narrowing down his search to a few databases that are highly relevant to his topic.
3. Users' Search Strategies
Most of the interviewees said that their typical search statements consisted of author's last name and subject-related keywords. They indicated that the ability to conduct these types of searches was their critical requirement for a common user interface. All the faculty and students interviewed identified keyword searching as their primary search method. Author searching was the second most heavily used method for limiting a search, with all the faculty and 13% of the students reporting it as one of their most frequently used search strategies. Most of the faculty relied on "backward and forward chaining" in conducting their literature reviews; typically, they started this process by searching for publications authored by someone they deemed an authority in the subject of interest.
We observed a significant difference between the faculty's and students' preferred search strategies. While 54% of the faculty sometimes used search techniques such as limiting by subject headings, update codes, publication date or journal title, none of the students took advantage of these features. Although we were initially concerned that users would be reluctant to sacrifice advanced search features, we found that most faculty and students in this study used very simple keyword searches for most of their research. The majority of those interviewed felt that the value of a common user interface would outweigh the potential loss of database-specific features.
4. Users' Perceived Benefits of Multi-Database Searching
In introducing the concept of multi-database searching to the interviewees, we initiated a discussion of artificial boundaries in the information space. Currently, resources are segregated in individual databases and collections, often bounded by the scope of a publisher's coverage, a library's holdings, or an information provider's specialized collections. We sought to understand users' perception of and sensitivity to these boundaries when conducting searches for their research.
To demonstrate ways in which these boundaries could be traversed, we exposed users to two variations of the multi-database search concept, both of which could be enabled using Z39.50. First, we described the simultaneous search, where the user could access a virtual database that is distributed and accessible via a single user interface with a single query. We indicated that simultaneous multi-database searching could break down the "stove pipe" approach to information by enabling a wide-area, parallel search of disparate databases or collections through a single query from a single interface. We distinguished this from sequential, or serial, multi-database searching which would enable the user to cast the same query against a set of databases, one at a time, by repeating the query in each successive database.
Multi- vs. single-database searching
After introducing the concept of multi-database searching, and exposing users to prototype interfaces, we asked interviewees to consider the value of this type of searching in the process of conducting their research. The majority of the sample responded favorably:
Among the faculty who reacted favorably to the multi-database approach, most felt that the multi-database search would enhance their research by increasing the breadth of information brought to their attention. They acknowledged a sense that they may be missing important information in their more limited searching of a few selected bibliographic databases. Many indicated that they tended to stick with a few known, reliable sources because they felt they could not devote time to seeking out new sources, learning how to use them, and adding additional steps to their existing information gathering process. Many faculty felt that the multi-database search could expand their horizons without requiring them to invest additional time. In short, the users in this study perceived both efficiency and increased access as major benefits to multi-database searching.
Simultaneous vs. serial multi-database searching
Faculty were interested in both the simultaneous and serial options for executing a multi-database search. When asked to compare the two, most faculty (52%) wanted to have both options available to them, and most indicated that they would like to choose the approach based on the situation or the context of the problem they were solving. The students were very clear on their preference, with 93% preferring simultaneous searching. Among the faculty, only 20% said they would prefer the simultaneous search to the sequential. None of the faculty or students preferred the sequential approach exclusively.
5. Users' Concerns with Multi-Database Searching
Slow response time
At first glance, it seemed that the faculty's interest in serial searching of multiple databases was somewhat inconsistent with their interest in efficiency. However, upon further investigation, we concluded that most faculty were interested in this feature as a result of fears related to system performance of a widely cast, parallel search of multiple sources. While finding the simultaneous approach conceptually appealing, a significant number of interviewees worried about slow response time. The users perceived serial searching as a way to control their session, while still offering more efficiency than the traditional method of querying individual resources characterized by distinct interfaces and query syntaxes. For instance, if a user anticipated that a particular query would take a long time to process in a simultaneous search of several databases, she may choose to work incrementally, by casting the search in one database, evaluating the results, and then launching the same search from the same interface against another database, possible refining it slightly to limit the result set.
A significant number of faculty (46%) reported a desire to be able to pick and choose databases for inclusion in their multi-database search. This might suggest that users valued the existing partitioning of information into discrete databases (e.g., Biological Abstracts) or collections (e.g. a particular publisher's set of electronic journals). By probing further into the users' interest in the source databases, we were able to ascertain that faculty were not expressing an interest in preserving this model of information organization, but in minimizing information overload and the slow response time they associated with searches cast over numerous databases.
Among the faculty, 46% said they did not want to be overwhelmed with citations, revealing that their primary interest in distinguishing individual databases was to reduce information overload by limiting the search to known and reliable sources. Nonetheless, the faculty expressed comfort with the idea of searching a more abstract information space, as long as they had the ability to control for the following: (1) the general type of information (they wanted to distinguish scholarly from popular material); (2) the time they would have to wait (they assumed searches of large information spaces would take a long time); (3) the quality of the results (they assumed that searches of large information spaces would yield many irrelevant hits). In the absence of these overt controls, users felt it was necessary to control which databases were to be included in a multi-database search.
Irrelevancy in the result set
Although 77% of faculty did not require the identification of the source database for individual references in a multi-database result set, many were interested in statistical feedback to help them determine which databases were most fertile and relevant for the problem at hand. For instance, many faculty expressed interest in a brief report of what percentages of their results were obtained from which source databases, instead of reporting sources for each citation in the results. With this information users could opt to focus their efforts in certain databases, exclude databases from their multi-databases search session, or refine their query to expand or limit the search results.
6. User Interface Issues with Multi-Database Searching
Initiating the Multi-Database Search Feature
We asked the faculty and students who expressed an interest in the multi-database search option how this feature should be presented to them in the Gateway. The faculty were aware that, in theory, all databases could be included, but that only some could be made Z39.50 compliant at this time. We exposed the interviewees to several prototype interfaces that took different approaches to initiating the multi-database search feature:
In response to these scenarios, 60% of the faculty and 93% of the students were interested in having the system automatically activate the multi-database search feature, and accordingly, were attracted to the first and third scenarios. Generally, these users wanted the multi-database search to be the default search mode, or they wanted the system to automatically make them aware of this option at an appropriate time. Students expressed little interest in the user-activated approach, however, 24% of the faculty reported interest in activating the multi-database search as an option, not as a default. Accordingly, these faculty preferred the third scenario since it let them connect to individual databases in a manner they were accustomed to, but provided the option to extend the search. This group also found scenario two to have some appeal since it gave them total control over which databases would be searched together. The remaining faculty (16%) and one student expressed no preference on the means of invoking this option in the system. Both the "system activated" group and the "user activated" group expressed an interest in the system helping them know which databases would work best together, either through default groupings of databases by subject, or through a customized recommendation based on some other "behind the scenes" analysis of their query.
Presentation of the search result set
We exposed the faculty and students to a very rudimentary screen design that depicted a result set from a multi-database search. Using paper and pencil, interviewees modified this design while discussing the implications of receiving "hits" from multiple sources. Their first issue centered on the management of duplicate records. Approximately 70% of faculty wanted to receive a merged result set with duplicate responses suppressed, meaning that records that were found in more than one source database would be reported only once. It should be noted that although 46% of faculty wanted the ability to pick which databases were included in a search, 77% did not require knowledge of the source database(s) for individual citations in the result set. As previously mentioned, students expressed minimal interest in choosing databases to be included in a search, and consistent with this, 87% were not interested in having the databases of origin reported with individual hits in the result set.
During our discussions on presentation of result sets, we encountered many users who said it would be beneficial to be able to check the library's holdings for items retrieved from a multi-database search. Many recognized the potential of including the Cornell Online Catalog in the multi-database search scenario and expressed interest in creating a button or a link on the results presentation screen to view holdings information for items encountered in the result set.
Generally, users reported an interest in viewing abstracts, and having the result set sorted in multiple ways, including by chronological publication date, by author, and by relevance to their query. It should be noted that users were interested in relevance ranking, however, they did not address issues of how data should be ranked across multiple databases.
In the current study, we found that users would be very interested in a common user interface for searching disparate bibliographic databases. Most were willing to sacrifice special features and advanced functionality found in native database interfaces in favor of a more generic and simple user interface to support their typical searches. A small percentage of our sample was interested in maintaining "back doors" to native database interfaces if a common interface could not support database specific features such as searching by concept codes and identifiers (e.g. BIOSIS, ERIC), or browsing specialized thesaurus (e.g., Medline).
This study indicates that the implementation of simultaneous multi-database searching should be approached with caution. Although users were very interested in the ability to search multiple databases together, they were already anticipating slow response time and being overwhelmed with information, particularly irrelevant information. More work needs to be done in the area of increasing relevancy of responses from queries executed against multiple, disparate resources. In the interim, users felt that irrelevancy could be minimized by a system that presented default groupings of databases that tend to work well together, or that are related by subject.
To satisfy user requirements for the presentation of results from a multi-database search, a system will have to support merged results sets, compression of duplicates, and cross-database relevance ranking. Since databases will typically reside on different computers, often using different Z39.50 servers, client software will have to manage the integration of records into a single, non-redundant result set. If these capabilities are not available in an existing Z39.50 client, or cannot be effectively developed in a custom-made client, libraries may want to introduce multi-database searching in a limited manner.
Lloyd, C., "A new digital library project on delivery of
copyright materials in electronic format: The Decomate user study,"
Payette, S.D. and Rieger, O.Y., "Supporting scholarly inquiry: incorporating users in the design of the digital library," submitted to Journal of Academic Librarianship for review, 1997.
Peterson Bishop, A., "Working toward an understanding of
digital library user: a report on the user research efforts of
the NSF/ARPA/NASA DLI projects,"
D-Lib Magazine, October 1995
Van House, N. A. et. al., "User centered iterative design
for digital libraries: the Cypress experience," D-Lib
Magazine, February 1996
Van House, N.A., "User needs assessment and evaluation for
the UC Berkeley electronic environmental library project: a preliminary
report," Digital Libraries '95: The Second Annual Conference
on the Theory and Practice of Digital Libraries (June 11-13,
1995 - Austin, Texas)