Summary of Stanford's Digital Library Testbed Design and Status

Andreas Paepcke
Stanford University

D-Lib Magazine, July/August 1996

ISSN 1082-9873

The Stanford Digital Library testbed focuses on interoperability among existing collections and publication-related services. Users view electronically accessible collections and services offered by third parties as global resources which together comprise a library to use in solving their tasks.

Stanford's user interface is constructed to let users drag and drop collections and services onto a canvas, based on the particular activity they are engaged in. For example, sales managers and marketing personnel may need to stay abreast of the competition in the product lines they represent. To do this, they can, for example, benefit from Internet-based selective dissemination services, which automatically observe frequently updated information sources to extract material relevant to the user. Such services are available now, and need to be viewed as part of digital library infrastructure.

As information is delivered continuously to the user, it needs to be examined, trimmed, organized and collated. Services in addition to collections, or repositories, are needed for these activities. For example, summarization services which accept documents and return an abbreviated version are beginning to be available and need to be integrated into digital libraries to help users with many aspects of their tasks, not just with search.

The challenge for a testbed which integrates existing repositories and services is that these resources are already being developed independently by different providers. This means they are accessible through various mechanisms, such as telnet and http. Beyond these low-level, communication access channel differences, the access models are very different. Some services require login, others require simple identification, others are completely open. Conventions for delivering the required information frequently differ.

In addition, many services in libraries are not free. Even in today's research libraries, access to services such as Nexus or Knight-Ridder's Dialog Information Service is for-pay. Sometimes these costs are not visible to the end user, but usually interdepartmental accounting is involved.

As for-pay services become more prevalent, interoperability among online payment schemes is becoming a problem. The Stanford testbed includes facilities for mediating between library users and for-pay services which use different payment schemes.

Related to payment is the problem of authentication and authorization. Many sources will eventually require users to authenticate themselves, and they will need to determine authorization based on the nature of the information and the relationship of the provider with the customer. The means for authenticating customers is also a potential source of interoperability problems. If a customer interacts with many sources and services, each with a different authentication scheme and authorization policies, a user-friendly digital library needs to help with appropriate infrastructure.

At the user interface level, all the interoperability problems need to be made as transparent as possible. While users do need to keep track of information provenance, the details of how the information is retrieved from a particular repository, or the interaction model for some particular services must be obscured if the large number of sources and services is to remain manageable and effective. The testbed is beginning to include mechanisms to help with this. Efforts which address the problems of heterogeneity at the user interface level, include various experimental user interfaces. For example, DLITE uses a drag-and-drop metaphor to help the user treat all the resources in as uniform a way as possible. The SenseMaker interface allows users to recursively cluster information into categories.

Figure 1 shows the InfoBus, the architecture of the Stanford testbed. It is based on a hardware bus metaphor to suggest that services, repositories, and clients are 'plugged in', and interoperate by taking advantage of interoperability mechanisms built into the testbed. The architecture is implemented with CORBA distributed object technology using ILU, an implementation supplied free of charge from Xerox PARC.

Figure 1: InfoBus

Resources Available to DLI Members

Several of our testbed facilities are available to DLI sites which run CORBA infrastructure. The testbed allows access to Web services such as Lycos, Web Crawler, Inctome, Excite, and AltaVista. At the same time, access to non-Web collections is provided through an identical programmatic and user-level interface. One example is Dialog, one of Stanford's industrial partners, providing a collection of information on news, literature, business and other areas which can be accessed through Telnet. Another example is a the Stanford Libraries' offering of IEEE's INSPEC bibliographic database on computer science, physics and related areas.

The testbed also provides access to various collections at other DLI sites. For example, it allows InfoBus clients to search over University of California, Santa Barbara's database of maps and several University of Michigan collections.

Services offered include Oracle's ConText summarization tool, as well as InterPay which manages online payment interoperability. Another testbed service is SCAM was developed as part of Stanford's Digital Library project. SCAM is a document authentication tool which allows users to maintain large collections of documents and check a document efficiently for similarity with any of the items contained in the collection. FAB is a service which selectively retrieves Web pages of potential interest to a user and learns from user feedback to improve its performance.

Testbed Services Available to Non-DLI Access

Several of the repositories and services we provide access to are for-pay, and we cannot make them publicly available at this time. However, InterBib, a tool for managing bibliography-related tasks, is available on the Web. This service accepts bibliographies in the Refer and BibTeX formats and returns simple or annotated bibliographies in HTML or MIF, Framemakers interchange format. It allows links to be specified in the bibliographies. These are included in the results as active hypertext links.

InterBib also allows testbed users to submit Framemaker or (soon) Microsoft Word documents that have citation keys distributed throughout the text. The user also submits relevant BibTeX or Refer bibliographies which InterBib uses to resolve the citations. The service returns a new document with the citations resolved and a reference list appended at the end.

Testbed Plans

We plan to maintain the testbed as much as possible after the project ends. However, university academic departments are not equipped to develop and run commercial-grade operations. Libraries everywhere are notoriously stressed financially, so they may not be effective transfer partners for the technology either. Many are struggling to maintain even their traditional functions.

A promising alternative route for transfer are new, start-up companies founded by graduating students and the industrial partners that are part of the DLI model. We are working closely with several of our partners and hope to transfer pieces of the testbed to them for eventual commercialization.

Groundwork for Testbed Continuation

Since Stanford's focus is on interoperability among third-party repositories and services, maintenance infrastructure is only one piece of the foundation that needs to be in place for the testbed to persist.

The work to be done now to ensure a smooth transition in many cases includes negotiation with information providers who have donated information, but who are understandably unwilling to allow access to the general public. Most of the reluctance stems from the absence of pervasive online payment facilities. A major piece of work to be done, therefore, is the deployment of payment schemes that allow information providers to be reimbursed.

Other work includes, of course, the continued development of standards at all levels. This includes low-level, communications access, information exchange standards, payment, copyright and more.

We have already transferred one service to a startup company, so we are very hopeful that this route to technology transfer will be open to us for other pieces of the testbed.

Copyright © 1996 Andreas Paepcke

