Digital Initiatives of the Research Libraries Group

Ricky L. Erway
Member Services Officer
Research Libraries Group
Mountain View, California
[email protected]

D-Lib Magazine, December 1996

ISSN 1082-9873

The Research Libraries Group (RLG) is a not-for-profit membership corporation of 150 universities, independent research libraries, archives, historical societies, museums and other institutions devoted to improving access to information that supports research and learning. Since its founding in 1974, RLG has been a pioneer in developing cooperative solutions to the acquisition, access, delivery, and preservation challenges those institutions face. RLG provides a forum for collegial sharing of resources, costs, experience, and shaping of the future. RLG membership is open to any not-for-profit institution with an educational, cultural, or scientific mission and a commitment to improving access to research materials. Access to the wealth of information resources, however, is not restricted to RLG members.

RLG successes are based on collaboration among research repositories in two areas. First, to build a union catalog, the Research Libraries Information Network (RLIN), of over one-hundred-million bibliographic records from over 250 sources describing books, journal articles, dissertations, and rare materials in over 365 languages. The RLIN databases form a one-of-a-kind resource for technical services librarians as well as for researchers.

Second, RLG members have seized the opportunities RLG creates for them to get involved in collective projects, both to preserve and make accessible collections of valuable materials and to jointly develop best practices that can be used in other endeavors. RLG's value is closely tied to its capacity to identify and implement cooperative structures. Shared cataloging programs, interlibrary loan pacts, records surveys, resource-sharing agreements, staff exchanges, and multi-institutional preservation microfilming projects are just a few instances.

Collaboration

The nature of digital conversion activities makes them particularly well-suited to collaborative endeavors. First, digitization is costly. No matter how compelling projections about potential cost-savings are, no matter how low the cost of storage becomes, and no matter how much conversion expertise is developed, the truth remains that digitizing is an additional cost libraries and archives are being asked to bear in the midst of budget cuts and decreased staffing. Digitizing does not replace the need to perform all the traditional tasks like acquiring, organizing, cataloging, and preserving materials. Further, in successfully responding to user needs, digitization results in increased expectations. So, digital conversion is not likely to be a short-term activity. But by working together, institutions can attract funding, minimize costs, and prevent duplicative efforts.

Second, conversion is well-suited to collaboration because bringing analog materials into the digital domain makes possible new kinds of access and resource sharing -- benefits that extend far beyond the institution doing the conversion.

Collaboration occurs within the local institution as well as in the larger community. Although recruiting project staff from across traditional departments means that staff don't always share common understandings and experiences at the outset of the project, it also means that the benefits of newly-acquired expertise and newly-forged partnerships will be realized throughout the institution, as well as across institutions, regions, and even nations. Fortunately, today's technologies bring with them a number of tools that actually facilitate collaboration. Tools like teleconferencing, electronic mail, and listservs are making it much easier for project staff to coordinate work, discuss challenges, and keep each other aware of progress.

Collaboration means sharing work and expenses, as well as benefits -- thereby getting the most of the money and effort each player expends. RLG projects enable, staff from libraries, archives, and museums to establish models that build on their institutional strengths while furthering collective goals.

Education

In November 1995, RLG hosted a symposium on Selecting Library and Archive Collections for Digital Reformatting. Participants were exposed to the evolving environment in which selection decisions are made and presenters provided them with strategies for making appropriate decisions that benefit their own institutions and the larger research community. The speakers and audience took part in a hypothetical selection process, sharing with each other their thinking about which of thirteen proposed collections would be appropriate for digital reformatting.

The interchange of ideas at educational events help keep staff in synch and influence RLG's agenda. Other symposia held over the past three years include "Scholarship in the New Information Environment," May 1995; "RLG Digital Image Access Project," April 1995; "Digital Imaging Technology for Preservation," March 1994; and "Electronic Access to Information: A New Service Paradigm," July 1993. The published proceedings of the symposia extend the value beyond the events' participants.

Digital Collections Projects

The first in a series of RLG Digital Collections Projects, "Studies in Scarlet" is a project to digitize and make widely available legal research materials on a common theme, marriage and sexuality in the US and UK, from 1815-1914. The goal of the project is not merely to digitize, but to develop a new resource for teaching and research -- one that could not have existed in a paper-based environment.

Any one of the participating institutions (Harvard University, New York Public Library, New York University, North Carolina State Archives, Princeton University, University of Leeds, and University of Pennsylvania) could have, on its own, worked through the processes and made their materials available on the Internet. Together, however, they combine their materials into a resource far greater than the sum of its parts as they collaboratively develop procedures that they and others can apply to other projects.

A task force of professionals in the subject area helped to guide the development of the content of the virtual collection. Another task force of experts with experience in the technical aspects helped to design a framework upon which the participants could build. Key project participants are taking the lead on image capture and on SGML-encoding to smooth the way for others for whom this may be their first experience in digital reformatting.

Working together, participants are collecting and will analyze time and cost data for all steps in the process at each of the institutions. In so doing, they can identify areas where one institution's costs are inconsistent with those of its colleagues.

In this project, RLG has encouraged participants to outsource the actual digitization work. There are many procedures involved in the digitization for the Studies in Scarlet project: imaging from bound volumes, loose sheets, and microfilm and fiche; text conversion; and SGML-encoding using as many as three SGML document type definitions. Any one of these activities would be difficult to get up and running in-house within a reasonable project timeframe.

Outsourcing also facilitates learning from experts in the field. Each participating institution identified and negotiated provisionally with vendors for each of its conversion tasks. Then the results were pooled to compare notes on pricing and vendor experience and to identify the most promising vendors. These then were presented with the combined quantities of all the project participants to get the best pricing possible. This activity acquainted several service providers with the special care and handling of archival collections and the special requirements for digital reformatting. Not only did this process present a solution for the Studies in Scarlet project, but it also begins to develop an information resource for others to use and to which they may add.

The Studies in Scarlet collection will be the focus of a user evaluation exploring the research utility of such collections. Planning has begun for a second RLG Digital Collections Project, focusing on materials reflecting the theme of international migration; it will build on what is learned from the first project. Both these and future Digital Collections Projects are intended to develop collections to which others can continue to contribute.

Archiving Digital Information

The Digital Collections Projects not only allow RLG to address reformatting and access issues, but will also serve as an example digital archive for another consortial effort to develop recommendations for ensuring that access to the data continues to be available, far into the future. The recommendations in the final report of the Task Force on Archiving Digital Information, co-sponsored by RLG and the Commission on Preservation and Access provide the springboard for RLG's work in this area. The draft report of the task force was widely distributed and the final version incorporated contributions from all over the world.

RLG and its members will pursue many of the task force's recommendations, primarily in developing standards, "best practices," and guidance for managing digital archives. Specific areas of investigation are the longevity of culturally valuable digital information, migration paths, means for authenticating documents, requirements and standards for describing digital information, and coordination with international digital preservation initiatives.

Metadata Projects

While the topic of metadata, the much-maligned term for information about information, dominates many library conferences and publications lately, it has always been the stock-in-trade of the library world. With digital reformatting of research materials, there are many layers of metadata; in addition to describing the intellectual content, metadata may include details about the capture process and technical specifications, information about other versions, and information about getting permission for the reuse of digital versions.

Preservation Information. RLG is forming working groups to address preservation-specific metadata. One working group is being formed to look at the preservation information that should accompany digital images of paper documents or photographs.

Discussions at the Selection for Digital Reformatting Symposium led to a recommendation that RLG create a registry for the exchange of information about the approaches taken in digitizing projects and about the resulting converted materials. To address that need, an international working group, formed by RLG in October, is investigating how to include information in bibliographic records about digitization efforts planned or underway, to help coordinate selection decisions, reduce duplication, and to optimize reformatting and archiving efforts.

Finding Aids. One of the most significant metadata undertakings is underway in the archival community. Making SGML-encoded archival finding aids available over the Internet is widely acclaimed as one of the most effective uses of the technologies at our disposal. Daniel Pitti and his colleagues at Berkeley spent two years proving the concept and getting buy-in from other institutions and organizations. Their work is being carried forward by the Bentley Fellows, the Society of American Archivists, the Library of Congress, and RLG.

RLG has launched a project called "the FAST track to improving access to archival collections." This project is training staff in the field in the use of SGML to encode their finding aids and will result in a central point of access to internationally held archival collections. Again, there are benefits to a single institution mounting its own finding aids -- but the greatest gain will be realized when thousands of finding aids from hundreds of institutions are made accessible to researchers in a seamless manner.

International Digital Projects

RLG is pursuing international collaboration by embracing globally scattered partners and learning how to incorporate international initiatives that build collections of research materials without regard for their physical location. There are now RLG members in Canada, Ireland, Italy, the Netherlands, Russia, Spain, Switzerland, and the United Kingdom. The first focused membership effort has been in the UK and Ireland. More than twenty institutions are now on board and it looks likely that more will join this year. A major goal of international membership is to bring together the resources and coordination needed to support today's increasingly international research.

WebDOC. Collaboration within the RLG community must in turn combine with other efforts to realize maximal gains. RLG's WebDOC project is one such project. Due to be launched as a pilot operation early in 1997, WebDOC addresses the issues of access to materials from a variety of sources for which a fee or licensing may be required. Requests for materials are made by activating links that funnel requests through an accounting/licensing server. After verification that the user has permission to access the material (either because his institution has licensed it or because the user has a means of paying "by the drink") a "golden URL" is issued. Pica, the Dutch bibliographic network, is partnering with RLG in this effort, together with a variety of document suppliers, that includes RLG member institutions and commercial publishers.

Virtual Collections. RLG and its members have been discussing how similar resources distributed across a number of repositories might be joined together as "virtual collections." This wouldn't always require a formal project, but might rely on communication about making certain holdings accessible. One of the virtual collections in the planning stages involves reassembling the Cairo Genizah. The fragments of Hebrew and Jewish literature and documents rescued from the Ben Ezra Synagogue in Cairo cover every aspect of life in the Mediterranean area a thousand years ago. Active research is leading to all manner of exciting discoveries about Jewish religious, communal, and personal life; Jewish and Islamic culture; settlement in the land of Israel, and relations with Muslims and Christians from as early as the ninth and tenth centuries.

The University of Cambridge's Taylor-Schechter Genizah Research Unit has made great headway in conserving, describing, and providing access to the majority of Genizah fragments in its collections. Many remaining pieces reside in other repositories. Taking a different approach, the Computer Geniza Project of the Department of Near Eastern Studies at Princeton University is creating a full text retrieval text-base of transcribed historical documents mostly in Judeo-Arabic. The documents are searchable by word with a new interface that links digitized images to the text. By taking advantage of these existing efforts, the lengthy start-up time involved in beginning a project from scratch can be reduced. Perhaps one institution only holds a few, a second owns dozens, while Cambridge holds about 150,000 fragments. By pulling these unique resources together into something that has the outward appearance of a single entity, the researcher benefits by finding much of what he or she wants in a single place and the institutions benefit from having put their materials into a broader research context where they will likely be valued more highly and used more often.

Archival Server And Test Bed

The infrastructure that will support all these initiatives is called Arches. The name Arches is a concatenation of "archival server," but also suggests multiple doorways into a variety of information resources and the bridges that connect catalog records, other metadata like archival finding aids, and the actual content described by metadata.

The Arches infrastructure will include solutions to issues of user and document authentication, version control, compensation to rights holders, the impermanence of URLs, efficient management of storage media, and the refreshment and migration of information and the ability to access it. Arches will also provide powerful searching tools for getting the most out of full texts and SGML-encoded information, as well as tools for navigation within image-only documents. Arches will serve as a test bed for collaborative investigation of digital capture and preservation practices.

The Work Ahead

Developing the standards needed for responsible digitization is daunting. Quality requirements, indexing approaches, compression algorithms, file architecture, and storage and access methodologies all need to undergo extensive discussion -- and those who have a stake in the outcome will want to reach consensus every step of the way. And, of course, as the way keeps changing, we have to continually reassess and update those conclusions. RLG's commitment to sharing what is learned not only provides value to RLG members, but contributes to innovative thinking in the larger community as well.

RLG led the way in developing standards for preservation microfilming and best practices for preparation of the materials and the information that should accompany the materials, developing technical specifications, performing quality assurance, and storing the master negative film. Applying the same approaches, RLG is beginning to develop best practices for preserving digital archives. In the digital environment, the long-term storage requirements are greater, the risks enormous, and the payoffs less certain. But vast amounts of data are finding their way into digital form and we need to ensure that they will survive into the foreseeable future.

An RLG task force has assembled a number of high priorities for small-scale projects to test assumptions, expose problems, and propose solutions for using digital technology as a preservation strategy. These projects include investigating scanning over-sized materials, determining what preservation information should accompany images, and ascertaining from researchers' perspectives which types of materials would be most useful to them in digital form.

RLG is contracting with a newly formed R&D service at Cornell University to conduct research that will benefit collective efforts. Some of the materials proposed for preparation under this contract are a generic Request for Information, Request for Proposals, and Contract for Services, for use by library or archive staff members responsible for selecting vendors able to perform digital reformatting work. Other proposed work includes developing materials to assist a librarian/archivist in developing a budget for a digital reformatting project; a cost-benefit analysis for preservation-based digital reformatting that compares traditional reformatting methods with page imaging; documents on selecting material in various formats, which addresses such issues as condition of original, potential options in scanning and text conversion, and hardware and software considerations; and a workshop on how to manage digital reformatting projects.

Summary

To continue to serve in their traditional roles of interpreters and guides along the information pathways, librarians and archivists need to work together, continuing to harness often heroic collective energies in order to solve problems together that no institution can solve alone. By providing opportunities to join in consortial projects with substantive results, developing recommended practices, and providing the infrastructure to develop and support access and preservation imperatives, RLG benefits not only its participating members, but adds to the collective wisdom of the profession in supporting the needs of researchers.

Copyright © 1996 The Research Libraries Group, Inc.

D-Lib 
Magazine |  Current Issue | 
Comments
Previous Story | Next  
Story

hdl:cnri.dlib/december96-erway