Repository Interoperability Workshop

Towards a Repository Reference Model

William L. Scherlis
Carnegie Mellon University
Pittsburgh, Pennsylvania
[email protected]

D-Lib Magazine, October 1996
ISSN 1082-9873


Repository Interoperability

The rapid proliferation of digital libraries and digital information resources creates greater challenges for digital library users in locating and accessing particular library assets. How can a user gain "transparency of access" to a broad range of digital information resources? This proliferation of information resources also creates significant challenges for intellectual property owners, who are entrusting networked information resources such as digital libraries to oversee increasingly valuable information assets. How can a library user interact through a single interface with an aggregate of diverse digital library facilities, and in a manner that provides assurance that the rights-in-data of intellectual property owners are protected?

Interoperation is a principal challenge in digital library research. The workshop reported here focuses on an aspect of interoperation that is beginning to receive increasing attention; this is the management of actual digital library information assets. Other dimensions of interoperability include, for example, bibliographic metadata, support for browsing, management of payments and related terms and conditions, access to task-specific meta-data, and annotation. For example, the July/August D-Lib Magazine included several articles devoted to the difficulties of identifying areas for commonality among an increasingly diverse array of meta-data. These commonalities provide a means to support effective resource description and search.

A sign of maturation of digital library technology and its application is the greater value of information assets being managed. In addition, there is increasing diversity and complexity of kinds of assets. This maturation suggests new priorities for digital library interoperation technology development. For example:

Repositories

These changes in the environment motivate explicit consideration of the most effective means for digital libraries to interoperate at the level of asset management. This kind of interoperation is referred to as "repository-level interoperation" to distinguish it from, for example, efforts focused primarily on meta-data. Repository interoperation thus deals primarily in actual digital library information assets, and so may initially seem straightforward, certainly as compared with the interoperation challenges faced at the level of domain-specific meta-data. But there are considerable challenges. Here are a few examples:

Many of these problems are familiar to information technologists in other contexts, suggesting that existing research and solutions can be exploited to support digital library applications. For example, distributed object mechanisms such as CORBA and OLE provide means for distributed management of objects and type information. With respect to naming, most digital library researchers are aware of the ongoing discussion in the World Wide Web community about URLs and URNs. The issues for the digital library community are how to assimilating these emerging solutions, matching them to the particular challenge of managing digital library information objects. An appropriate repository framework could enable exploitation of these technologies while providing a scalable approach to digital library interoperation at the level of information objects.

The CNRI Exploratory Workshop

To understand this challenge, the Corporation for National Research Initiatives (CNRI), as part of the D-Lib program, convened an exploratory Repository Interoperability Workshop in March 1996. This workshop, held in Reston, Virginia, brought together a group of about 20 researchers, with the intent of better defining the issue and the understanding the challenges associated with it. The workshop consisted of four parts: a review of related research efforts, an identification of issues, three separate working groups to explore the repository concept and consider approaches to interoperability, and a plenary discussion to coalesce results and assess consensus. Preliminary results were presented at the March 1996 ACM Digital Library Conference. The sections below summarize some of the points raised at the workshop and the resulting conclusions. However, this report has not been coordinated with all attendees.

A Note on Interoperability, Reference Models, and Architectures

As noted above, there are many dimensions of digital library interoperation, including user interaction, search and presentation, more general meta-data, and asset management. Achieving a workable engineering approach to interoperation entails understanding the specific utility to be provided by the aggregate service. This leads to an identification of specific points of commonality (for example, the Dublin Core metadata elements) and diversity (task-specific metadata elements not covered in the Dublin Core).

To this end, much of the workshop activity was focused on identifying some specific elements of a reference model for repository function. A reference model is a conceptual framework that identifies characteristics of repositories that need to be common in order to achieve interoperation, but which does not specify a particular implementation approach or a system interface. At a later stage, there may be agreement (or not) on the details of specific system interface elements or overall architecture, but note that this is not as important for interoperation as the reference model itself: Consistency with the reference model assures feasibility of interoperation, though potentially elaborate wrappers and mediators may be required in an actual implementation.

Summary of Results

Repository Service Interface. There was a strong consensus at the workshop that the concept of a digital library repository should be defined in terms of a Repository Service Interface. That is, the repository function is defined in terms of requirements on the protocol for interaction with a client, rather than in terms that are more prescriptive and implementation-oriented. At this early stage of technology development, it is unacceptable for repository interoperability requirements to overly constrain the range of digital library implementation choices. The Repository Service Interface concept enables a separation of decisions regarding repository architecture and implementation from decisions concerning base functionality, and thus permits accommodation of a variety of new and legacy approaches to repository implementation.

A Layered Model. The next issue is where to place the Repository Service Interface in the "hierarchy" of function from raw storage management to "full library function." There was agreement on two points. First, the Repository Service Interface operates above the level of the privileged storage management operations that store, retrieve, and delete individual objects. These low level operations need to be privileged since they provide full capability to alter and access the contents of a collection. A Repository Service Interface would require a client to present explicit access tickets before a "performance" of an object can be delivered. This implies that the repository must itself be a trusted entity, operating at a layer higher than that of raw storage management. Thus, the Repository Service Interface represents a kind of "fiduciary interface" that encapsulates the core of trust that asset holders place in a digital library (or other network information system), separating that core from other value-added services.

The second point of agreement concerned meta-data and traditional library functions. A repository should interpret only those meta-data elements that directly pertain to functions such as object storage, object performance, rights-in-data, and terms and conditions for access. For example, meta-data relating to cataloging, search, location, and other traditional library functions is not interpreted at the repository layer. That is, this latter kind of meta-data is stored in a repository as another class of object. This separates repository function, concerning management of access to and performance of objects, from higher level library functions relating to cataloging, search, browsing, and so on. The meta-data that supports these functions is managed in repositories, but as independent objects with their own separate access- and performance-related meta-data. Thus, the repository operates at a lower level than library functions supporting search and presentation.

These considerations lead to a functional model with four layers: (1) A bottom layer that supports storage management and operates in a privileged mode. This layer would provide the usual features associated with data management systems, such as support for availability, reliability, persistence, versioning, and so on. (2) A trusted repository layer that supports client access to objects based on permission tickets and service requests. (3) A library functional layer or layers that support search and other library functions (for example, z39.50 service, though z39.50 also includes some elements of layer (2)). (4) A user or client layer corresponding, for example, to a z39.50 client. This model identifies the repository as a well-defined island of managed information, potentially corresponding to a legal entity with respect to its job of assuring respect for the rights-in-data of the intellectual property owners associated with the managed objects.

Repository functions. There was limited consensus at the workshop concerning the specific operations supported by the Repository Service Interface. Several efforts in this area ( Kahn and Wilensky; Lagoze and his collaborators at CNRI, NCSA, and Cornell; Garcia-Molina, Winograd, Paepcke, et al..; Arms; among others) are developing abstract models that would contribute to a Repository Service Interface definition. The Kahn and Wilensky work, in particular, draws a sharp distinction between layers (1) and (2) and the outer layers. Workshop results concerning concepts for objects, names (handles), performances, and service requests generally follow the results of these efforts.

A service request to the Repository Service Interface, for example, would include an object handle (a unique opaque identifier), an access ticket (that embeds information about client privileges), and a service request (specifying a particular presentation for an object). The result would be a particular performance (or "dissemination") of the object. Meta-data relating, for example, to content, interpretation, and interlinkage of objects is stored but not interpreted at the repository level.

It is important to note that agreement on a reference model for the Repository Service Interface does not force definition of a specific repository-level protocol. Indeed, it is possible (though not necessarily desirable) for there to be a multiplicity of repository-level protocols. But agreement on a suitably defined reference model would enable interoperation, even if through a set of wrappers (i.e., repository proxies) and/or mediators and other aggregation points.

Recommendations

While many issues were left open in the discussion (such as mutability of objects, details concerning terms and conditions, details of the reference model, and so on), several clear recommendations emerged:

Interoperability. Users, librarians, intellectual asset owners, and other stakeholders all benefit from interoperation among digital libraries.

Layered Model. The interoperation problem has many facets, many of which are already being addressed. But if high-value assets are to be shared, then an approach to interoperation at the repository level needs to be identified. The repository operates as a trusted entity.

Repository Reference Model. Many repositories are already in operation, and there are many architectural and implementation approaches being explored for digital libraries and the repositories they contain. Therefore, repository-level interoperation will not come about in the near term through adherence to a single specific repository-level protocol. A reference model, however, can provide a conceptual basis for describing and comparing repositories and can lead to common concepts.

Repository Service Interface. The most important common concept is the Repository Service Interface, which is a set of requirements for the protocol by which a client interacts with a repository. In this context, the "client" includes the outer layers of the digital library model. Since different repositories may have different realizations of the Repository Service Interface, interoperation would be accomplished through wrappers. (This approach is already being explored in several of the major digital library research projects.)

Experimentation. Existing digital libraries and digital library research efforts may benefit from exploring commonalities that might lead to a common reference model.

Working Group on Terms and Conditions. Repositories interpret meta-data pertaining to rights-in-data, performance, and associated terms and conditions. These meta-data enable a repository, for example, to assess what kinds of performances can be granted to a requester on the basis of the specific access tickets that are presented. Many kinds of information assets are now being managed using digital library technology, with many distinct traditions of management of rights-in-data. A working group should be initiated to assess how terms and conditions associated with information assets can be represented and managed at the repository level and beyond.


Copyright © 1996 William L. Scherlis

D-Lib Magazine 
|  Current Issue | Comments
Previous Story | 
Next 
Story

hdl:cnri.dlib/october96-scherlis