Federating Repositories of Scientific Literature

An Update on the Digital Library Initiative at the University of Illinois at Urbana-Champaign

Susan L. Harum, William H. Mischo, and Bruce R. Schatz
University of Illinois at Urbana-Champaign
s-harum@uiuc.edu, w-mischo@ux1.cs.uiuc.edu, schatz@uiuc.edu

D-Lib Magazine, July/August 1996

ISSN 1082-9873

1). What is available in the testbeds for experimentation among the Digital Library Initiative (DLI) members?

The UIUC DLI project will be able to provide to fellow members of the Digital Library Initiative access to its Testbed, which currently holds the full text of SGML documents from selected engineering journals. These journals, which number 20 at the present time, are being furnished by the American Institute of Physics (AIP), the American Physical Society (APS), the American Society of Civil Engineers (ASCE), IEE (Institution of Electrical Engineers), IEEE (Institute of Electrical and Electronics Engineers), IEEE Computer Society, and the American Society of Agricultural Engineers (ASAE). Over the next year, journals from the American Association for the Advancement of Science (AAAS), the American Institute of Aeronautics and Astronautics (AIAA) and John Wiley & Sons will be added to the Testbed.

The UIUC DLI is utilizing OpenText Corporation's OpenText Index Search engine. The search engine, which indexes and accesses DLI project documents, is an extremely robust and expandable system that allows phrase, Boolean, and proximity searching, and is tailored to SGML processing and retrieval. One of the cornerstones of the distributed repository model is the separation of metadata and index data from the full-text articles themselves, which allows efficient retrieval from subsets of the full-text repositories. To this end, the UIUC is working towards identifying the optimum bibliographic metadata to assist in full-text retrieval. The metadata elements presently being specified include: URL, basic bibliographic information, author information (including affiliation), abstract, figure and table information, and bibliography information (including all citations).

The UIUC DLI is collaborating with the University of Michigan DLI Interface team to incorporate Search Tree retrieval techniques first developed by Karen Drabenstott for the ASTUTE Project into the UIUC Testbed interface design. These techniques serve as search navigation aids, by suggesting to the user various modifications that will broaden or narrow search results, addressing the frequent problems users encounter with too few or too many retrievals. This collaboration will provide a means for testing the efficacy of various interface designs across several DLI projects.

Semantic concept space technology, which utilizes co-ocurrence term matrices, will be transferred to the University of California at Santa Barbara DLI. The UIUC DLI focuses on methods that interactively provide the user with conceptual maps that offer alternative search terms.

(2) What is available in the testbeds for experimentation among other groups?

Due to the nature of the legal agreement between the UIUC DLI and our partners, the UIUC Testbed will not be available for experimentation among other groups outside of the DLI until the 3rd and 4th years of the grant (1997-1998), at which time access will be provided to the CIC (Committee on Institutional Cooperation) community, which is comprised of: the University of Chicago, The University of Illinois, Indiana University, the University of Iowa, The University of Michigan, Michigan State University, University of Minnesota, Northwestern University, Ohio State University, Pennsylvania State University, Purdue University and the University of Wisconsin at Madison.

DLI research involving concept spaces for scalable semantic retrieval will be shared with semantic researchers, e.g. with the group at Rutgers University. The large collections generated by supercomputers are available as well, e.g. concept spaces across 1,000 areas of engineering.

The UIUC DLI project is emphasizing the collaborative effort needed between university researchers and our publishing partners. A large part of the project will be to look at the future roles of publishers, libraries, A & I services, and authors, and to extend the model of distributed publisher repositories developed in the DLI project. We now have a viable retrieval system and are in the process of setting up a framework for a system of federated repositories The DLI will be in a position to advise our publishing partners in setting up their own repositories and we will continue to work on the research needed for effective indexing, retrieval and display of the full-text of scientific and engineering journals.

(3) What are the plans for the testbeds at the end of the project?

The University Library at the University of Illinois is developing an industrial partnership program to augment the research and development work going on in the DLI project, and to extend the Testbed past the four year NSF/ARPA/NASA grant period. In addition to continuing in the collection and maintenance of large-scale digital collections, we hope to expand the scope and breadth of the Testbed to include other full-text journals, preprints, and conference papers. The UIUC Library hopes to establish a collaborative environment with DLI publishing partners in order to more fully explore the potential of SGML retrieval and display and the distributed repository model, linking to other digital projects, including image databases, numerical databases, and A & I service databases.

We expect that during the course of the DLI project many of our publisher partners will create their own repositories. The DLI role will continue to emphasize the gateway or front-end access to multiple information repositories, including full-text repositories. This will help the Testbed evolve into a multiple-view reference system to a federated collection of distributed repositories. The repository management package will let other organizations and individuals make their organized collections searchable via a multiple-view interface.

The UIUC DLI project is providing major input to the next-generation server that NCSA is building. The server will move from a WWW document server using HTTP to a distributed repository host using multiple protocols. The server version 2.0, due the summer of 1996, will feature a modular protocol design and integrated security. We will later incorporate the work on stateful gateways into the server on the output end. The input end will incorporate the work on collection development, so that the new server will eventually support session history and metadata checking. Later versions will also support security measures such as token passing.

(4) What can be done now to anticipate the continued operation of the testbeds beyond the current project?

Industrial technology transfer between DLI partners and other companies is key to the advancement of information retrieval for large collections. The structure of the DLI projects, with large testbeds and many partners, is set up to encourage technology transfer of new developments. For effective search of multimedia objects across multiple repositories on the Web, software developed by the research teams must be shared with companies with the means to advance research.

To expand and continue the research efforts, such as the scalable semantics research, additional funding for research must be available. This might take the form, for example, of a follow-on initiative to the DLI in another area of information management, such as analysis environments.

Copyright © 1996 Susan L. Harum, William H. Mischo, and Bruce R. Schatz

