Stories

Spacer  

D-Lib Magazine
January 2000

Volume 6 Number 1

ISSN 1082-9873

Evaluating Website Modifications at the National Library of Medicine through Search Log Analysis

Spacer Line
Spacer

Aaron Redalen
NLM Associate Fellow
National Library of Medicine
[email protected]

Naomi Miller
Systems Librarian, Public Services Division
National Library of Medicine
[email protected]

Spacer Line
Spacer

Introduction

The effective design, presentation and functioning of search facilities are integral parts of website design, particularly for the sites of information-rich organizations such as libraries. Libraries face particular challenges because many of the traditional sources of bibliographic information, such as online databases, may not be integrated into the Web environment. As Clifford Lynch points out, "The Net is not a digital library." ( Lynch, 1997). Since search functions are an important and heavily used facet of the National Library of Medicine website, and since NLM produces many databases that are not fully integrated into its website, we have undertaken a series of analyses and modifications to guide users in effective site navigation and use of the search functions.

It is difficult to find much published literature on the use of World Wide Web search tools and their impact on users' navigation. However, in our review of the literature, we learned that many web users tend to immediately follow any available "search" links without first determining what other links are available, or what database or collection of documents is to be searched. In a 1997 article on search interfaces, web usability expert Jakob Nielsen observes that, "Our usability studies show that more that half of all users are search-dominant...The search-dominant users will usually go straight for the search button when they enter a website: they are not interested in looking around the site; they are task-focused and want to find specific information as fast as possible" (Nielsen, 1997). Despite the fact that many users gravitate towards search features in navigating websites, another study of searching behavior found that less experienced users frequently misunderstand the functioning of search engines (including what is being searched), and easily become frustrated when they are unable to locate relevant information (Pollock, 1997). These findings are consistent with our experiences in the work described below.

Website Development and Analysis

NLM established its presence on the World Wide Web in October 1993, with a collection of HTML pages called HyperDOC. At that time, the NLM website offered information about the Library and its programs and services, but no search functions were available. The only database in HyperDOC was a browsable collection of images from NLM's History of Medicine section. Users were not able to search MEDLINE, NLM's premier bibliographic database of biomedical citations and abstracts, from HyperDOC. A page of information about MEDLINE informed users of how to obtain a userid and password which would allow them to search, for a fee. Since that time, NLM has developed, introduced and redesigned a variety of web-based resources to better serve the information needs of its users.

A search feature was made available on the NLM website in March 1996. At that time, the home page linked to seven major categories of information. Users who followed a "Search Index" link from the website's front page were presented with a text box that searched the website for information on NLM and its products, projects, and services. At that time, the website's pages mainly offered descriptions of library services, such as hours of operation and interlibrary loans, or of programs such as the Visible Human Project. Examination of the search logs revealed that a large proportion of the searches being performed was for medical topics rather than for information about NLM. It was suspected that users might have been confusing the website index with MEDLINE. To try to help users find medical information, NLM website designers created an intermediate search page, with one link to a new "Medical Topics" page, and another to the searchable website index. It was hoped that users following the initial search link in pursuit of health information would then see the link to the medical topics page and follow it, rather than searching the NLM site index.

Intermediate search page

Intermediate search page, National Library of Medicine website. August, 1996.

Naomi Miller studied the effectiveness of this strategy by capturing one week of searches from before and after these changes were made, then counting the number of medical topics searches from each week (Miller, 1997). From June 16-22, 1996, before the proposed changes were made, 90% of NLM website searches were for medical information. By the week of August 13-19, after the intermediate search and Medical Topics pages had been added to the website, the proportion of searches for medical topics had dropped significantly, from 90% to 79%. Encouraged by these results, NLM then added a link from the front page directly to the medical topics page in an attempt to further reduce the proportion of medical topics searches. Unfortunately, another analysis of searches by Miller indicated that for the week of September 1-7, 1996 the proportion of medical topics searches remained unchanged at 79%. This may lend support to the hypothesis that search-driven users tend to follow "search" links first, regardless of other available links. However, by presenting two search options on an intermediate page, NLM was able to temporarily reduce the proportion of medical topics searches being performed on the NLM website.

The NLM website was redesigned in June 1997 to coincide with the availability of free web-based access to MEDLINE (NLM NEWSLINE, 1997). The intermediate search page was eliminated, and new buttons directed users to "Free MEDLINE," "Health Info," and "Search NLM Site." "Free MEDLINE" directed users to a page describing the two free methods of access, PubMed and Grateful Med. "Health Info" described other NLM sources of health information, such as the DIRLINE database of health-related organizations, and provided links to selected government health web sites. The "Search NLM Site" button led to the site index.

Another change took place in October 1998, a modification of NLM's website that coincided with the public release of MEDLINEplus, NLM's web-based guide to authoritative consumer health information for the public (NLM NEWSLINE, 1998). Originally offering 22 health topics, MEDLINEplus has subsequently expanded to include over 300 different health topics, in addition to preformulated search strategies designed to assist users in searching MEDLINE for relevant literature.

NLM homepage from Oct. 1998 to Jul. 1999

National Library of Medicine homepage. October 1998-July 1999.

The ease of online access to MEDLINE and the increasing availability of a growing range of consumer health information on MEDLINEplus might have been expected to facilitate users' efforts to retrieve medical information and to reduce the proportion of medical topics searches on the NLM website. In a paper presented at the 1999 Annual Symposium of the American Medical Informatics Association, Alexa McCray and NLM researchers examined NLM website searches "to understand the nature and scope of these queries in order to understand how to improve users' access to the information they are seeking on our site" (McCray, et al., 1999). The researchers processed August, September, and October 1998 search log entries and mapped them against the Unified Medical Language System (UMLS) Knowledge Source Server, and found that 94% of the queries were medical in nature. Clearly, the earlier reductions in the proportion of medical topics searches had not lasted over time. Similarly, Blecic (1999), in a study of OPAC screen changes, found that some improvements in searching behavior were not sustained over time.

The most recent redesign of the NLM website went public in late July 1999, when the site's appearance was updated and its pages and links restructured to improve quick and efficient access to the most commonly sought information about NLM and its services. Among other changes, a new intermediate search page was introduced; now, users who follow the "Search Our Web Site" from the headers of the site's pages are presented with an intermediate page that simply asks whether the user is searching for "information on a health topic" (with a link to MEDLINEplus) or "information about NLM's programs and services" (a link to the NLM website search page). Also, prominent links to MEDLINE and MEDLINEplus were included on the front page of the website.

Intermediate serach page July 1999

Intermediate search page, National Library of Medicine website. July 1999.

NLM website designers hoped that the reintroduced intermediate search page would better guide users searching for medical topics to MEDLINEplus or MEDLINE and, consequently, reduce the number of medical topic searches being performed on the NLM website. The design team reviewed and analyzed the website�s search logs to determine whether the proportion of health topics searches through the NLM site search page had decreased following the NLM website redesign.

Methods

Ht://Dig, NLM's current search engine, is used in three different locations to search three collections of pages on the NLM website (MEDLINEplus, NLM's intranet, and the NLM website in its entirety). Searches are logged, with each log entry including a date/time stamp, the collection searched, search options, the queries themselves, and the number of hits retrieved. Two weeks of search log files were retrieved from NLM�s web servers: 6/10/99-6/16/99, several weeks prior to the most recent website redesign, and 7/29/99-8/4/99, immediately following this redesign. The semicolon-delimited text files were then imported to a pair of Microsoft Excel ® workbooks, where the data was examined and edited to correct translation errors and to remove log entries generated as a result of known search engine errors. Next, the search logs were filtered to isolate entries generated by NLM site searches. The website design team decided to study a random sample from each set of search logs, from which statistics about each week could be inferred. A sample size of 400 search log entries from each week was considered large enough to sufficiently minimize standard error. A pair of Excel macros was then written and used to extract 400 random samples from each of the two week-long search logs. Next, each sampled search log entry was characterized as representing a medical search (searches for diseases, anatomy, physiology, diagnoses, therapies, etc.), or a non-medical search. Medical dictionaries, MeSH headings, and drug and chemical databases were consulted before obscure or difficult searches were characterized as non-medical. Summary inferential statistics for the two weeks were then calculated. Finally, a chi-square test was used to determine the statistical significance of the difference between the proportions of medical NLM site searches made during these two weeks.

Results

The results of the search log analysis are presented in Table 1 below. The total number of searches of the NLM Website (including main Website searches, MEDLINEplus searches, and intranet searches) dropped approximately 40%, while the proportion of main Website searches among all searches plummeted from 46% to 17%. The proportion of main Website searches that were medical decreased from 76% to 61% (+/-4.26% and 4.87%, respectively) between the two periods studied. A chi-square test indicates that the distribution of these proportions is not independent of the change between the two periods; therefore the difference is statistically significant.

The decrease in the total number of searches of the NLM Website is striking, particularly because other measures of use of NLM's Web information services (e.g., page hits on PubMed/MEDLINE and on MEDLINEplus) increased from the first study period to the second.

Table 1.

Search Log Statistics
  6/10/99-6/16/99  
  7/29/99-8/4/99  
Main Website, Intranet, and MEDLINEplus Searches
31,650
18,872
Main Website Searches Only
14,514
3,240
% of All Searches
46%
17%
%
+/-
%
+/-
Main Website Searches - Medical
76%
4.26%
61%
4.87%
n=400
p=.05

Conclusions

Many web users seem predisposed to search-based navigation, and they are easily frustrated when their searches are not successful. On a site such as NLM's, which contains many underlying databases of information that are not accessible via Web search engines, leading users to the search features and tools that will return the type of information they are seeking is an important part of website design. Analyses of search logs suggest that NLM has become increasingly successful in directing users to appropriate components of its Website. In 1996, the addition of an intermediate search page that offered users a link to a page of medical topics effected a significant reduction (from 94% to 79%) in inappropriate searches of NLM's website for medical topics. Unfortunately, this initial reduction did not hold over time, and the proportion of searches for medical topics soon rose back to 94%. The most recent redesign of NLM's website, which combined a drastically streamlined homepage with a new intermediate search page, has resulted in both an overall decline in the use of the NLM website search feature and in its inappropriate use for searching health topics. It remains to be seen whether this reduction will hold over time.

Although this report presents a fairly simple application, the examination of website search logs is clearly an important part of evaluating the impact of website changes. As software tools become more widely available, search log analysis could be augmented with web page hit analysis to reveal common user navigational paths through websites and to highlight trends in the use of website resources.

References

Blecic, D., et. al. (1999). A longitudinal study of the effects of OPAC screen changes on searching behavior and searcher success. College & Research Libraries, 60(6), 515-530.

Lynch, C. (1997). Searching the Internet. Scientific American, 276(3), 52-56.

McCray, A., et. al. (1999). Terminology issues in user access to web-based medical information. American Medical Informatics Association 1999 Annual Symposium. Available: http://www.amia.org/pubs/symposia/D005626.PDF [1999, December 8].

"MEDLINEplus" website launched. (1998). NLM newsline [Online]. 53(3-4). Available: http://www.nlm.nih.gov/pubs/nlmnews/juldec98.html#MEDLINEplus [1999, December 16].

Miller, N. (1997). Improving the NLM home page: from logs to links [Online]. Available: http://www.nlm.nih.gov/psd/web_poster/web_poster.html [1999, December 16].

Nielsen, J. (1997). Search and you may find. Alertbox [Online]. Available: http://www.useit.com/alertbox/9707b.html [1999, December 16].

Pollock, A. & Hockley, A. (1997). What's wrong with internet searching. D-Lib Magazine, March 1997. Available: http://www.dlib.org/dlib/march97/bt/03pollock.html [1999, December 16].

Vice President Al Gore Launches Free MEDLINE. (1997). NLM newsline [Online]. 52(2-4). Available: http://www.nlm.nih.gov/pubs/nlmnews/maraug97.html#Gore [1999, December 16].

<img src=

 

<img src=

(At the request of the authors, the URL for Ht://Dig was corrected to Http://www.htdig.org, email addresses for the authors were changed, and reference to Excel was changed to Microsoft Excel ®. These changes were made on 1/18/00 at 8:08 am. Larger images added 1/18/00, 9:24 am.)

<img src= Line
Top | Contents
Search | Author Index | Title Index | Monthly Issues
Previous story | In Brief
Home | E-mail the Editor
Spacer Line
Spacer

D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/january2000-redalen