A Consortial Approach to Digital Library Management

David Barber
Director, New Service Development
Columbus, Ohio

D-Lib Magazine, April 1997

ISSN 1082-9873


I. Introduction 
II. Background 
III. A Technical and Managerial Approach to Information Service Implementation 

A. A Centralized Model For Information Storage and Dissemination 
B. A Commercial Approach To Technology 
C. Research Is Still Important 
D. Partnerships Are Essential to Success

IV.  OhioLINK's Technological Future

I. Introduction

OhioLINK is an Ohio-based consortium of academic institutions which offers a wide variety of information services to its member institutions. It is currently making significant efforts to extend its content offerings beyond traditional citation databases and library catalogs to include electronic journals, images, GIS data, and other forms of content. OhioLINK's efforts to develop the foundations for these new services are one centralized model of the way that the technological and management problems of digital library service development can be approached in a large, consortial environment. 

OhioLINK's approach to technology is founded on the decision to centralize content. This has been done to ensure a high-quality technical infrastructure for information management and delivery. "High quality" has been defined by OhioLINK as using commercial tools to supply the building-blocks for customized information management solutions, rather than using technologies that might be adopted from current research projects.  Instead of deploying  research products directly, OhioLINK has focused on its ability to create research opportunities for Ohio faculty in computer science and related disciplines as a result of its new services. In addition to this approach to technology, OhioLINK has sought to ensure the quality and value of its services by developing partnerships with centers and faculty at its institutions that are specialists in particular types of content. OhioLINK has also partnered with the Ohio Supercomputer Center (OSC) to help address operational issues like mass storage and high performance computing, and thereby allow OhioLINK to focus on information management. 

The result of this approach to technology has been the selection by OhioLINK of a strategy similar to commercial organizations faced with the problem of managing various multimedia resources and their dissemination on the Internet. OhioLINK has chosen object-oriented technologies to provide the software for its new content management efforts. Object-oriented and object relational databases are seen as the enduring mechanisms for information management. Distributed object technologies, the Object Management Group's Common Object Request Broker Architecture (CORBA), provides the foundation by which OhioLINK information collections may be integrated together. These tools are exploited by the development of custom or customized solutions to the problem posed by OhioLINK's unique and diverse set of information collections. Custom interfaces are developed which change more frequently than underlying information management systems and evolve to reflect improvement in World Wide Web (WWW) interface technologies. 

II. Background 

OhioLINK is a state funded consortium of more than 50 public and private universities, colleges, community and technical colleges in the State of Ohio. The consortium includes widely varying types of institutions from large research universities like Ohio State University or the University of Cincinnati to small rural technical colleges like Belmont Technical College. Thus, the clientele for OhioLINK services includes most types of learning and teaching situations. 

OhioLINK provides information services to more than 500,000 faculty, staff and students. It currently offers a central catalog of library holdings for these institutions representing more than 6,000,000 titles. Each member institution maintains a local library catalog whose holdings are replicated in this central catalog. Library patrons at any institution can request items from this central catalog and the requested items will be delivered to the library at their home institution. Each institution no matter how small has a virtual library collection worthy of a research university. 

More than 57 research databases have also been licensed by OhioLINK. These include many standard citation databases, e.g. ISI citation indexes, and a large collection of SGML texts. The SGML texts include scholarly text collections like those published by Chadwyck-Healey and reference works like the OED. OhioLINK's full-text offerings are supplemented by electronic journal collections including a statewide license for Academic Press journals and the UMI PowerPage journal collections. License negotiations with other publishers are underway and nearly complete that will increase the number of OhioLINK's electronic journals by 1200 titles before the end of 1997. 

OhioLINK also has significant plans under way to expand its range of electronic content offerings beyond these types of materials. New services to provide additional content in the form of electronic journals, numeric data (e.g. social science surveys), images, and vector geographic data, are planned for implementation during the next year. A central repository to store images produced at its member institutions will be provided for images of all formats and types, including satellite images, scanned art slides, digitized x-rays, and manuscript images. This content is expected to exceed several terabytes within the next few years.  This new content -- its size, heterogeneity, and diverse user constituencies -- poses significant and interesting management issues and is shaping OhioLINK's approach to organizing and implementing technology at the system level.

III. A Technical and Managerial Approach to Information Service Implementation 

A. A Centralized Model for Information Storage and Dissemination. 

OhioLINK has chosen to manage content centrally rather than as a set of distributed databases at its member institutions, which must otherwise be searched in parallel or in sequence to find relevant resources. Only library catalog information is located at OhioLINK member institutions, but even in that case a central database is maintained with the collective contents of all the local library catalogs. 

There are three key reasons why this centralized model has been selected: 

  1. Institutions realize from past experience that a centralized collection is richer than a number of smaller distributed collections. It can be difficult to search a large number of local databases simultaneously. It is also unlikely that all local databases will share the same schema and have the same kind of interface.

  2. Institutions face many burdens which reduce their desire to take on the task of managing additional local content. By centrally locating content, institutions are able to spend staff time on digitization of collections and supporting instruction and research, rather than on database administration. Even if institutions had a desire to manage content locally, it is unlikely that most OhioLINK institutions could manage digital library projects in all subject areas: few institutions can provide staff to manage online collections of images, spatial data, text, and other types of content.

  3. A central organization is better able to maintain a more sophisticated technical environment than can be made available by locating information centrally. OhioLINK can purchase software that most local institutions could not afford individually and implement better data security measures than those institutions.

OhioLINK's management of content still leaves local institutions with the option of managing additional content locally if they choose. OhioLINK can not license and manage everything everyone would want. Consequently, our central efforts make it possible for local institutional resources to include everything that OhioLINK offers and whatever is acquired locally. Still, these local efforts have been diminishing over time and been focused on limited types of content. 

B. A Commercial Approach To Technology 

To provide the building blocks for its information systems and to determine what information management tools are available, OhioLINK looks to the commercial sector. It looks both to identify commercially sold software and to identify what tools commercial organizations with similar problems are using.  This is in contrast to looking at computer science and digital library research projects to provide this  information.  The commercial software tools are then customized or integrated in such a way as to meet OhioLINK's unique requirements.    

The causes for this choice of approaches include: 

  1. Our mission is production not research. We provide production services to a large number of students, staff, researchers, and faculty. Research software often does not have to be of a quality to support a production service.

  2. Support for commercial software is generally better. Commercial software vendors have support staff. They sell to a large number of sites so there is better documentation available as well as support in the form of Usenet groups and user groups. Support is also better because it is easier to find training services, or to hire people who already have experience with products.

  3. There is better long-term availability of products and ports to new platforms. Using Oracle or a mainstream DBMS (Database Management System) for an information management project, means that a tool is being used where there is a substantial continued interest by the company and customers in long-term support for the product and for migration strategies to new versions when needed.

  4. Less expenditure of effort on code maintenance. Increased expenditure on software maintenance means less expenditure on direct service delivery.

  5. Research projects often continue long after the basic ability to offer a type of content online has been demonstrated sufficiently to justify development of OhioLINK services. For example, many projects are ongoing to study the dissemination of spatial information on the Internet. These can investigate many worthwhile issues like database integration, query optimization, and human-computer interaction with spatial information. That research may lead to higher quality spatial data systems; however, the basic point that spatial data may usefully be offered on the Internet was demonstrated by the response to the first individual who connected a Geographic Information System (GIS) system to a  World Wide Web (WWW) form. This occurred shortly after HTML forms were developed.

  6. None of the parts of OhioLINK's problems are really unique although the combination of those problems may be.  There are commercial organizations who also face OhioLINK's problems:  storing images, GIS data, text, and other forms of content; integrating distributed information services; performing various transformations on information content, e.g. TIFF to GIF.  The scale of the economic impact of the commercial instances of these problems is the key factor in shaping the availability of tools to solve OhioLINK's problems.

  7. With more than 500,000 users, the end users desktop is largely beyond our control. It is next to impossible to effectively deliver software to these users, consequently we must work with the kind of commercial packages and Internet software likely to be found on the remote user's workstation.

A corollary of the focus on mainstream commercial approaches to technology management is a dissatisfaction with vertical market solutions, e.g. library specific solutions to problems. These are generally products with limited markets, slower tendencies to incorporate new technologies, and the use of technical standards unique to vertical markets, e.g. Z39.50. Generally, vertical market solutions diminish the benefits attributed to commercial software above. Vertical market solutions also tend to be insufficient for organizations with as diverse a set of information management problems as OhioLINK. For example, OhioLINK's need to offer access to many types of images would require the purchase of a large number of software products if vertical image management solutions for particular types of images, e.g. art images or satellite images, were used. 

C. Research Is Still Important 

Where does this situation leave research? First, research projects must influence commercial industry to affect the kind of services OhioLINK offers. Second, OhioLINK is probably a more important generator than a consumer of research. OhioLINK is working to create opportunities for computer science and digital library researchers to be able to take advantage of our collections. This content will be deployed in an environment where research opportunities to test new algorithms or approaches to computation on OhioLINK content will occur. Thirdly, OhioLINK will be offering support for archiving the results of research programs at its member institutions. This content will enrich OhioLINK's collections and ensure that these important assets of the State of Ohio are preserved. This effort may also enhance research because OhioLINK may be able to create a better, more productive research environments because of the quality of its information management solutions. 

D. Partnerships are Essential to Success 

For the OhioLINK project to succeed, expertise is required on computer systems, information management, and on the content managed. This is more than OhioLINK can provide on its own given the need to support all forms of content for any conceivable subject area. As a result we are pursuing a strategy of partnering with key research institutes and organizations at our member institutions. 

To assist with the analysis of information management solutions for particular types of content or subject domains, we rely on support from key groups at member institutions. For example, OhioLINK has partnered with OhioGISNet a collection of GIS experts from nine OhioLINK schools to identify needed spatial data collections and a spatial data management solution. OhioLINK is also working with social scientists at a number of schools as it plans to make statistical data sets available. Further, a partnership is being developed with faculty at Miami Univ. to make Landsat images available to all OhioLINK schools. 

OhioLINK has entered into an agreement with the Ohio Supercomputer Center (OSC) to provide the location for this software and content. Its cutting-edge environment is the means by which OhioLINK can focus attention on the problem of software and information management rather than related hardware issues. OSC will provide OhioLINK with a large scale storage structure to which additional capacity can be added incrementally. A multi-terabyte capacity will be available to OhioLINK. In addition, this storage capacity is placed in the context of excellent storage management facilities. A hierarchical storage management system is in place to allow automatic migration of OhioLINK content to nearline tape. 

At OSC, OhioLINK servers may be connected to this nearline storage system and other OSC computers through an internal Gigabit network which is itself directly connected to the state Internet backbone. This will make it possible to get OhioLINK content out to member institutions as fast as is possible. The involvement of the OSC in the Internet 2 project and ATM testbed programs will further ensure that network infrastructure for OhioLINK activities will continue to be improved. 

IV. OhioLINK's Technological Future 

A technological strategy has evolved from these forces and attitudes. At a general level, OhioLINK is building its future on object-oriented technology. This technology in general, and CORBA in particular, are seen as providing a strong basis from which OhioLINK can benefit from new Internet and resource integration technologies produced by a growing object-oriented software industry. Distributed objects will provide OhioLINK with a richer, more flexible, and widely supported set of software products for integrating diverse resources than would library specific distributed searching technologies like Z39.50. 

In addition, object models can provide OhioLINK with a means by which an enduring understanding of the structure of its content can be developed. Even if technologies change, the information gained about OhioLINK content and its relation to certain information processing tasks that is derived from object modeling will be beneficial. 

OhioLINK is about to issue a Request for Proposal (RFP) for a DBMS system which will provide the underlying storage for its objects. An object-relational or object-oriented DBMS is being sought. While there have been shifts in the fundamental architectures for DBMS, these do not occur very often. A well-chosen DBMS should provide a long-term solution for information storage. New types of DBMS will allow OhioLINK to add modules to the DBMS to handle particular object types. Thus, the traditional limitations of RDBMS for storing many types of information such as text and images will have been overcome. It will be possible to store textual metadata attributes of OhioLINK content together with that content and utilize appropriate retrieval strategies for each. 

Object models and DBMS are seen as the stable foundation for OhioLINK's long-term management of information resources. On top of this foundation will be constructed less enduring software. The software which constructs the user's interface and manages user interaction is continually changing, currently it is shifting from creating HTML to creating JAVA user interfaces. It is at this level and at the integration level that OhioLINK experiences the need to have custom software created. OhioLINK finds that no commercial off the shelf software exists to provide an interface for its broad collection of types of information. Commercial software technologies thus provide the stable foundation for OhioLINK's information management plans and the building blocks for its systems. But to integrate these building blocks and specialize them for OhioLINK needs requires custom software development. OhioLINK can thus not totally avoid the need to spend money on software development.  However, by taking advantage of new user interface technologies and integrating resources, the highest return can be obtained for this investment. 

This choice of software technologies makes OhioLINK very similar to commercial organizations which are attempting to make their information collections WWW accessible. Hopefully, the commercial sector's interest in object technologies will be sustained over the long-term and continue to grow so that OhioLINK may continue to enjoy the benefit of their further development. 

Copyright © 1997 David Barber 

D-Lib Magazine |  Current Issue | Comments
Previous Story | Next Story