Articles
spacer

D-Lib Magazine
July/August 2001

Volume 7 Number 7/8

ISSN 1082-9873

Generalizing the OpenURL Framework beyond References to Scholarly Works

The Bison-Futé Model

 

Herbert Van de Sompel
Cornell University
herbertv@cs.cornell.edu

Oren Beit-Arie
Ex Libris (USA), Inc.
oren@exlibris-usa.com

Red Line

spacer

Abstract

This paper introduces the Bison-Futé model, a conceptual generalization of the OpenURL framework for open and context-sensitive reference linking in the web-based scholarly information environment.

The Bison-Futé model is an abstract framework that identifies and defines components that are required to enable open and context-sensitive linking on the web in general. It is derived from experience gathered from the deployment of the OpenURL framework over the course of the past year. It is a generalization of the current OpenURL framework in several aspects. It aims to extend the scope of open and context-sensitive linking beyond web-based scholarly information. In addition, it offers a generalization of the manner in which referenced items -- as well as the context in which these items are referenced -- can be described for the specific purpose of open and context-sensitive linking.

The Bison-Futé model is not suggested as a replacement of the OpenURL framework. On the contrary: it confirms the conceptual foundations of the OpenURL framework and, at the same time, it suggests directions and guidelines as to how the current OpenURL specifications could be extended to become applicable beyond the scholarly information environment.

Introduction

This paper introduces the Bison-Futé model, which can be regarded as a conceptual generalization of the OpenURL framework for open and context-sensitive reference linking in the web-based scholarly information environment [Van de Sompel and Beit-Arie 2001, Van de Sompel and Hochstenbach 1999a, Van de Sompel and Hochstenbach 1999b, and Van de Sompel and Hochstenbach 1999c]. This paper is of an abstract nature, and it should be considered a report on thinking in progress; it will leave some questions unanswered and issues untouched. The paper starts with a brief summary of the OpenURL framework; for a deeper understanding, a reading of [Van de Sompel and Beit-Arie 2001] is recommended.

The paper attempts to provide a conceptual basis for the NISO standardization process of the OpenURL. The OpenURL specifications [Van de Sompel, Hochstenbach, Beit-Arie 2000] were submitted to NISO by the authors, and were accepted as a fast track work item towards its development as an American National Standard. The NISO Standards Committee AX will also try and look into the applicability of OpenURL concepts beyond the scholarly information environment. Referring to the generic OpenURL concepts, the NISO AX Committee Charge puts it this way:

" we have to keep in mind other information communities where the generic mechanism for making identifiers and metadata available to service components may be applicable."

The OpenURL framework

In the web-based scholarly environment, a user interacts with an information service, and as a result of that interaction retrieves references to scholarly works. Typically, the information service also attempts to deliver reference links or extended service-links along with each of those references. It has been shown that in many cases, these default links are not adequate, because they are not sensitive to the context of the user clicking the link [Van de Sompel and Beit-Arie 2001; Van de Sompel and Hochstenbach 1999a]. The OpenURL framework has been proposed as an architecture that addresses this problem, by creating the possibility for third parties to provide additional, appropriate [Caplan and Arms 1999; Van de Sompel and Beit-Arie 2001] links for a referenced item, upon explicit request of a user. The OpenURL architecture is based on the following fundamental concepts:

  • The collaboration of information providers: Information providers that support the OpenURL framework insert a hook with each reference that is delivered to the user. The hook is an HTTP request -- called an OpenURL for the referenced work -- the purpose of which, when clicked by the user, is to deliver metadata and identifiers of the referenced work to a third party.
  • The existence of third party service components: These service components are the targets of the hook provided by information services. The task of the service components is to deliver -- upon request of a user -- extended services that relate to the referenced work, using metadata and identifiers delivered via the hook.
  • The existence of a specification that describes the format of the hook: In the current deployment of the OpenURL framework, this specification is the OpenURL draft [Van de Sompel, Hochstenbach, Beit-Arie 2000]. The OpenURL specification:
    • Describes the syntax for encoding a reference to a scholarly work as a concatenation of name=value pairs;
    • Describes the syntax for encoding elements that describe the context in which the reference is provided as a concatenation of name=value pairs;
    • Describes the way to turn this encoded information into a link (HTTP request). For instance, in case OpenURL is encoded as an HTTP GET request, the name=value pairs become the <searchpart> of a URL, of which the http://<host>:<port>/<path> is the location of a third party service component (e.g., http://<host>:<port>/<path>?<searchpart> format of URL in [Berners-Lee, Masinter, et al. 1994]).

Since the OpenURL framework allows third parties to deliver service-links for references in web-documents they do not own, it has been called an open linking framework for the web-based scholarly information environment [Van de Sompel and Hochstenbach 1999a]. Hereby, the open stresses the fact that the framework gives users the freedom to request reference links and extended services relating to a referenced work from a party other than the party that delivers the reference.

It is important to note that many OpenURL's can refer to the same scholarly work:

  • The actual reference to the work can be expressed in many different ways by different information services;
  • Since the context in which the work is referenced can be very different, the description of the contextual elements on the OpenURL is variable.

Generalizing the OpenURL framework

It may be helpful to visualize the OpenURL framework as an architecture that allows a user to escape from the metadata plane in which default links relating to a referenced scholarly work are delivered by information services. The architecture gives the user the freedom to reach into an overlaying service plane and ask a service component to deliver additional/alternative/appropriate service-links that relate to the referenced scholarly work (e.g., Figure 1 and Figure 2 of [Van de Sompel and Beit-Arie 2001]). In this architecture, the OpenURL specification is the glue that enables interoperability between information services and service components.

One can easily imagine this architecture to be extended to references made on the web in general, not just for scholarly material, but also to cities, diseases, cars, houses, abstract concepts, etc. The main pre-requisite for this extension is the existence of metadata and/or identifiers that describe the referenced items. This is very commonly the case, as many communities have created identifier or metadata schemes to achieve interoperability between systems.

These references made on the web in general can be regarded to be in the basic web-plane. They can come with -- default -- author-embedded links. One can imagine a user reaching out into an overlaying service plane to ask specialized web-services for alternative service-links related to a referenced item. Such service-links could be thought of as alternative routes across the web that are dynamically provided by third parties -- i.e., not by the actual author of the web-page where the reference is made -- upon request of a user. They are routes in an overlaying service plane that are not available from the web-document in which the reference occurs.

As a matter of fact, several companies/projects have introduced solutions that aim at the delivery of such overlay services. Examples include the now discontinued ThirdVoice search and annotation tool, NBCi's QuickClick, the Dialpad agent, the link session solution of Steve Hitchcock's PeP [Hitchcock and Hall 2001], the hypermedia link service of Microcosm, Microsoft's Smart Tags, Dexter-based hypermedia services for the World Wide Web [Gronbak, Bouvin Niels Olof, et al. 1997], and to a certain extent Netscape's What's Related [Curtin, Ellison, et al. 1998].

As is the case with the OpenURL framework, these solutions rely on the existence of service components, that store or generate link information separate from web-documents (e.g., [Halasz and Schwartz 1994]). However, they do not conform to the two other fundamental concepts of the OpenURL framework: the collaboration of authors of web-documents to insert hooks and the existence of a formal specification for the hook. Rather, these solutions use proxying/screen-scraping techniques building on:

  • Proprietary helper applications that pre-scan web-pages for occurrences of non-formally structured references to certain types of items (for instance using dictionary-matching techniques);
  • The introduction of HTTP requests with a proprietary structure, pointing at the solution's "service component" for every referenced item for which the pre-scanning software detected a match.

Judging by the quick acceptance of the OpenURL approach in the web-based scholarly environment, it is tempting to speculate about the possible acceptance of a generalization of its approach to the web in general. One can imagine that asking the collaboration of web-authors in providing interoperable hooks to allow for third-party provision of link-services might be achievable if those authors use software tools to dynamically deliver web-documents. The task for hands-on authors might be far less trivial, and acceptance might inter alia depend on the simplicity of the specification for the hook. One can also imagine how the adherence to a hook-specification might contribute to a solution to the problem of lack of persistence of links provided for references made on the web (for instance, see [Lawrence, Pennock, et al. 2001] regarding lack of persistence of URL-references to scholarly works; and [Phelps and Wilenski 2000] for an approach to make hyperlinks robust). One can imagine how an extension/generalization of the OpenURL concepts could make the lives of the companies/projects mentioned above easier, make their solutions interoperable, and lead to the emergence of competing innovative services, aimed at dynamically delivering alternative routes across the web. Also, a collaborative approach may be more appealing to authors of web-documents, who may be concerned about the intrusive screen-scraping approaches which blur the authorship of documents. They might feel more comfortable with a model in which the decision regarding which references are subject to the delivery of overlay services remains under their control.

It is not the intention of this paper to further speculate on the chances of the acceptance of an OpenURL-like approach for references made on the web in general. Rather, the purpose is to provide a conceptual framework that allows thinking about such a generalization, regardless of whether or not it will ever be deployed.

The Bison-Futé model

In the remainder of this paper, the Bison-Futé model and its components will be introduced. The Bison-Futé model is a conceptual framework that generalizes the OpenURL concepts. The purpose of the Bison-Futé model is to allow third parties to deliver alternative services that relate to items referenced on the web using an approach that is directly derived from the OpenURL framework. The name Bison-Futé (approximate pronunciation \be-zon-foo-tay\) refers to the name given in France to alternative roads that are recommended by the government for those who prefer not to drive on the main highways. Hence, the author-embedded, default links provided on the web are the parallel of main highways in France, while the alternative services for references made on the web are the parallel of the alternative roads.

The attention of the reader is drawn to the relationships that exist between the concepts introduced below and important ongoing efforts in the realm of the Semantic web [Berners-Lee, Hendler and Lassila 2001), Open Hypermedia research (see the above and, for instance, [Carr, Bechofer and Goble 2001]), and Knowledge Management research (for instance, see for www.aktors.org). The notion of the annotation of informal documents by formal concept descriptors for the Semantic web is a relationship that is of special relevance to the ideas described here, even considering that the focus of that work is on querying, not linking. Still, the authors have made the explicit choice to describe the concepts of open and context-sensitive linking for the web using the concrete perspective of an existing, lightweight and successful OpenURL application as the starting point. This approach exploits the experience with deploying that application to try and identify the bare essentials required to facilitate open and context-sensitive linking on the web, and to arrive at a more abstract model derived from the concrete. Also, relationships with languages to describe concepts such as RDF or RDFS inevitably come to mind. Again, the authors have chosen to describe the model independent from such languages, and have tried to focus on the essential tools that a language must offer to be applicable in the realm of open and context-sensitive linking. The intent of this approach is not to ignore the ongoing work. Quite to the contrary, it is hoped that this approach will inform the aforementioned efforts, by means of the addition of an ingredient that is derived from an actually deployed application.

Nature of the generalization

The Bison-Futé model is a generalization of the current OpenURL framework in several aspects:

Scope

In the Bison-Futé model, the concept of open and context-sensitive linking for references made in web-pages is generalized beyond the Web-based scholarly information environment into the realm of references to published works in general (CDs, CD-ROMs, audio files, videos, etc.), objects (cities, cars, people, companies, etc.) and abstract concepts referenced in web-pages.

These referenced items will be called referents in the Bison-Futé model.

Context

The OpenURL specifications allow for the description of only 3 types of entities: the referenced item (OBJECT-DESCRIPTION), the information service in which the item is referenced (ORIGIN-DESCRIPTION), and the service component that will deliver the extended services (BASE-URL). However, the deployment of the OpenURL showed that other entities should be described as part of the full context in which a request for a contextual provision of services occurs. As a result, the Bison-Futé model introduces the following new entities: the user requesting the services (the requester), the type of service that is requested (serviceType) and the information entity that actually makes the reference to the item (the referring-entity).

For that purpose, the term ContextObject will be introduced in the Bison-Futé model: the ContextObject is a construct that contains a description of all entities that are important for the contextual provision of services for an item that is referenced (Figure 1).

Schemes

In the current OpenURL specifications, an OpenURL for a referenced item must not necessarily physically contain the full reference for the item. The OpenURL specification allows for:

  • By value transfer of such information: the information is physically delivered via the HTTP request (OBJECT-METADATA-ZONE; GLOBAL-IDENTIFIER-ZONE);
  • By reference transfer of such information: a pointer to the information is delivered via the HTTP request, in which case the service components must resolve the pointer in order to get hold of the actual information (GLOBAL-IDENTIFIER-ZONE; LOCAL-IDENTIFIER-ZONE).

In Bison-Futé, this property is extended to all entities that describe the context. In addition, the by reference approach is made more flexible: the interpretation of a pointer in the current OpenURL specifications requires intelligence at the end of the service component for the resolution of the pointer into metadata. The Bison-Futé model provides for pointers that can be resolved without a need for additional intelligence at the end of the service component.

Also, an OpenURL allows a referenced item to be described by means of identifiers and/or metadata that complies with a metadata scheme (the OpenURL metadata scheme) which focuses on scholarly works. Again, in the Bison-Futé model, this property is extended to all entities that describe the context. Moreover, the existing ambiguity between identifiers of items and identifiers of metadata about items is resolved. In addition to that, since the scope of Bison-Futé goes beyond the scholarly information domain, other metadata formats will be allowed for the description of entities.

In Bison-Futé, the term descriptor will be introduced to refer to a uniform way to describe the entities that are involved in the contextual provision of services for a referenced item. A desciptor will allow for the specification of entities that are involved in the process of the contextual provision of services, in ways that are specifically designed to optimize that process (Figure 2).

Encoding

The OpenURL specifications describe how to provide information about entities as a sequence of name=value pairs on an HTTP request. However, as the experience with the deployment of OpenURL has shown, other encoding schemes may be desirable, for instance in cases where information about multiple referenced items must be provided. The notions introduced in the Bison-Futé model are, therefore, not tied to any specific approach for encoding. Moreover, the description of entities of the ContextObject (by means of descriptors) is disconnected from the provision of an encoding of those descriptions as an HTTP request. Indeed, the HTTP request will be referred to separately as an OpenResolutionLink.

Bison-Futé concepts

In a manner that parallels the above description of the fundamental concepts of the OpenURL framework, those of the Bison-Futé model are introduced here. At the same time, the terminology used in the Bison-Futé model is introduced. It will be explained in more detail in the remainder of this paper. In order to facilitate a better understanding, Table 1 lists the typical OpenURL-framework terminology along with the corresponding Bison-Futé terms.

OpenURL framework

Bison-Futé model

Web-based scholarly information environment

Web in general

referenced scholarly work

referent

citation to a scholarly work

citation to a referent

hook for citation to scholarly work = OpenURL :

hook for citation to referent = ContextObject :

* standardized reference to a work

* descriptor of a referent

* standardized reference to contextual elements

* descriptors of contextual entities

* hook turned into link = OpenURL

hook turned into link = OpenResolutionLink

service component

resolver

extended services; reference links services

the referenced scholarly work; the service component which is the target of the OpenURL; the information service providing the OpenURL

entities

Table 1: A comparison between the terms used in the OpenURL framework and the Bison-Futé model.

The Bison-Futé model is based on the following fundamental concepts:

  • The collaboration of authors of web-documents: Authors that want to support the Bison-Futé model insert a hook with each reference to a referent (i.e., the referenced item) that is delivered to the user. The hook is called a ContextObject for the referent. A ContextObject is made up of several descriptors; the core descriptor is the one for the referent, but there are descriptors for entities describing the context in which the referent is referenced as well (Figure 1). Each descriptor can contain metadata and/or identifiers or information that allows for fetching metadata and/or identifiers (Figure 2). Eventually, the ContextObject is turned into an HTTP request, which is called an OpenResolutionLink for the referent. The purpose of the insertion of the OpenResolutionLink for a reference in a web-page is to deliver metadata and identifiers relating to the referent and to the context in which the referent is cited to a third party, when clicked by the user.
  • The existence of third party resolvers: These resolvers are web-services that are the targets of OpenResolutionLinks. The task of the resolvers is to deliver -- upon request of a user -- services that relate to the referent that is at the core of the ContextObject (hence also at the core of the OpenResolutionLink derived from the ContextObject). Basically, a resolver resolves descriptors of referents into services, in the context of the other descriptors that are part of a referents' ContextObject.
  • The existence of specifications for encoding descriptors, ContextObjects and OpenResolutionLinks.

Bison-Futé terms

The remainder of this paper will provide a more detailed description of the terms that have been introduced for the Bison-Futé model, specifically:

The scholarly article referenced in Table 2 will be used as the referent in the examples provided in the remainder of this paper. Details about the referenced article can be explored at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=
10942764&dopt=SGML


Moll JR, Olive & M, Vinson C. Attractive interhelical electrostatic interactions in the proline- and acidic-rich region (PAR) leucine zipper subfamily preclude heterodimerization with other basic leucine zipper subfamilies. J Biol Chem. 2000 Nov 3 ; 275(44):34826-32.
Table 2: A reference to an article. The article will serve as the referent in the examples of this paper.

Entity, descriptor

An entity is a thing specified by the use of a descriptor, thereby allowing it to be referred autonomously.

A descriptor is a vehicle defined to specify an entity, using one or more of the following types:

  • entity-id: the combination of a reference to a namespace and an identifier of the entity that is unique within the referenced namespace.
  • metadata-id: the combination of a reference to a namespace and an identifier of metadata of the entity that is unique within the referenced namespace.
  • metadata-description: a metadata-description is the combination of a reference to a metadata scheme and a description of the entity expressed according to the referenced metadata schema.
  • metadata-description-pointer: the combination of a reference to a metadata scheme and a pointer to metadata of the entity expressed according to the referenced metadata schema.
  • private-zone: unspecified content for community-specific use and extendibility.

 

The scholarly article referenced in Table 2 is an entity, because it is possible to create a descriptor for it. In fact, one or more of the following could be included in a descriptor for the entity:

  • entity-id: This entity has a digital object identifier associated with it. Therefore, the reference to a namespace can be doi and the identifier within that namespace is 10.1074/jbc.M004545200
  • metadata-id: This entity is indexed in PubMed and its metadata record in PubMed has a unique pubmed identifier. Therefore, the reference to the namespace could be pmid and the identifier in this namespace is 10942764.
  • metadata-description: PubMed provides metadata descriptions rendered according to several schemes for this entity. A reference to a metadata scheme could be PubMedSGML and the PubMed metadata record rendered according to that scheme would be the description of the entity.
  • metadata-description-pointer: Again, the reference to a metadata scheme could be PubMedSGML, in which case the pointer would be http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=
    PubMed&list_uids=10942764&dopt=SGML
Example 1: An illustration of the notion of a descriptor.
The article referenced in Table 2 serves as the entity.
spacer

 

Regarding the above:

  • The notion of the descriptor is introduced in order to allow referencing an entity by means of a single construct. The descriptor serves the purpose of making metadata and identifiers of the entity available. With the descriptor, this is achieved in an explicit way -- by value -- and/or an implicit way -- by reference. In the explicit way, metadata and identifiers are physically contained within the descriptor, whereas in the implicit way, metadata and identifiers can be fetched based on information available in the descriptor.
  • It is out of the scope of this paper to give a formal definition of a namespace. In the context of this paper, a namespace is considered to be a collection of identifiers, each of which is unique within the collection. However, there are perceivable boundaries to the collection. These boundaries can be formal definitions of a syntax to which all identifiers within the collection comply, a dictionary of terms that make up the collection, a way in which the collection is managed, etc. Some examples should support a better understanding: the doi namespace consisting of all digital object identifiers; the mailto namespace of all e-mail addresses; the http namespace of all http addresses; the MESH namespace of all MESH subject headings; the namespace of all accession numbers of records in an abstracting and indexing database; the UTC namespace of all moments in time that can be expressed by means of the W3C datetime specification.
  • descriptors can be encoded in many different ways. Each encoding technique is referred to as a descriptor format. A descriptor format can be regarded as a general structure that allows describing an entity, by means of identifiers, metadata and pointers to metadata. A descriptor format can allow for multiple elements of the same type, e.g., it can allow for zero or more entity-ids. For a given descriptor format, there can be many descriptors for a single entity.
  • A descriptor format depends on the ability to unambiguously reference namespaces and metadata formats. Certain applications may require that such references be interpreted on a global scale. In this case, descriptor formats may require the existence of an agency charged with the maintenance and public disclosure of such referencing conventions. For other applications, it may be sufficient that references to namespaces and metadata formats are interpreted on a local scale, i.e., in a controlled environment, specific community, etc. In this case, no maintenance agency will be required. In any event, it is important that a descriptor format allows distinguishing between the global and the local case unambiguously.
  • The difference between a metadata-id and a metadata-description-pointer lies in the intelligence required for their resolution into actual metadata. A metadata-id may not be resolvable without the addition of "external intelligence", for instance the addition of the knowledge of where identifiers of the referenced namespace can be resolved. In contrast, a metadata-description-pointer can always be resolved without the addition of "external intelligence".

In order to obtain a more concrete insight regarding what an entity is, see Table 3, which shows some examples of entities along with elements that can be used to create a descriptor of the entities, and the types of these elements.

entity

descriptor can be constructed based on

type

Journal article

DOI

entity-id

Economics journal article

EconLit record

metadata-description

Medical journal article

PubMed identifier

metadata-id

Journal

MARC record

metadata-description

Journal

ISSN number

entity-id

Book

ONIX description

metadata-description

Book

ISBN number

entity-id

Book in a library

Call-number

entity-id

Book in a library

MARC holdings record

metadata-description

Person

Person's e-mail address

entity-id

Person's home address

SRI Indicode identifier

entity-id

Person affiliated with a university

LDAP URL leading into the institutional directory

metadata-description-pointer

Preprint

OAI identifier

metadata-id

Economical author

HoPEc record

metadata-description

Printed music publication

ISMN number

entity-id

Company in the information industry

SAN code

entity-id

Company delivering web-services

UDDI identifier

entity-id

Company involved in EDI

D&N DUNS number

entity-id

Astronomical object

SIMBAD object identifier

entity-id

City in the US

US ZIP code

entity-id

Digital audio file

Relatable TRM fingerprint

entity-id

Economics subject JeL classification subject descriptor entity-id

Medical concept

MESH subject heading

entity-id

Web server

HTTP address

entity-id

Region on earth

CSDGM record

metadata-description

Moment in time

UTC datetime

entity-id

Table 3: entities and sample descriptors.

ContextObject, OpenResolutionLink

In the abstract, a ContextObject is a structure for referencing:

  • a core entity called the referent;
  • entities that are part of the context in which the referent is referenced;

by means of a descriptor for each entity. As such, the ContextObject is a container of descriptors, with the descriptor of the referent at its core.

Figure 1 is an illustration of the notion of the ContextObject. Figure 2 shows the relationships between the ContextObject, entities in the ContextObject, descriptors for those entities and types that can be used to create descriptors for the entities.

ContextObject and its entitities
Figure 1: The ContextObject and its entities

As is the case with descriptors, several methods to encode ContextObjects can exist. It is important to note that in the Bison-Futé model, the ContextObject for a given referent is not an HTTP request. The actual encoding of a referent's ContextObject as an HTTP request is called an OpenResolutionLink for the referent. Again, several ways to generate OpenResolutionLinks from ContextObjects can exist.

A ContextObject can contain a descriptor for the following entities:

  • The referent: The referent is the entity being referenced. It is at the core of the ContextObject, and at least one referent must be present in every ContextObject. It is the referent's descriptor that will eventually be resolved into services.
  • The resolver: The resolver is the web-service that will resolve the referent's descriptor into services. A resolver must understand OpenResolutionLinks. A resolver's descriptor is required in the ContextObject at the point that the ContextObject is encoded into an OpenResolutionLink (i.e., into an HTTP request).
  • The referrer: The referrer is the web-service that provides a reference to the referent.
  • The referring-entity: The referring-entity is the "atomic" entity within the referrer that contains the reference to the referent.
  • The requester: The user or user-agent requesting the resolution of the referent's descriptor. It should be noted that a requester could be a person working from a networked device, or the requester could be a computer program.
  • The serviceType: The serviceType describes the nature of the resolution requested by asking a resolver for the resolution of the referent's descriptor. The serviceType allows for specifying the kind of service the requester aims for when requesting the resolution of a referent's descriptor.

 

The article referenced in Table 2 can be regarded a referent of a ContextObject. Indeed, in the above, it has already been shown that a descriptor can be created for it. Making the article the referent of a ContextObject that is provided in a web-document along with an informal reference to the referent, paves the way for the delivery of services related to the referenced article. In addition, the ContextObject can contain descriptors of other entities describing the context in which the work is referenced. For instance:

  • The resolver: The resolver is a web-service. Hence, the most straightforward descriptor in the ContextObject for the resolver would be the combination of an identifier of the namespace of all HTTP addresses -- say HTTP -- and the identifier of the web-service within this HTTP namespace, e.g., sfxserv.rug.ac.be/menu, which is an SFX server at Ghent University.
  • The referrer and the referring-entity: The article referenced in Table 2 is cited in a journal article accessible via the Protein Science journal web-service of HighWire Press. If a ContextObject were provided in this citing article, along with the citation of the article referenced in Table 2, then obviously, the latter article would be the referent of the ContextObject. The citing article would be the referring-entity, and since it is a medical journal article, its descriptor could be of the metadata-id type: it could consist of a reference to the PubMed namespace -- pmid -- as well as the PubMed identifier of the citing article -- 11344333. The referrer in the ContextObject indicates the encompassing resource in which the referent is referenced. For instance, the referrer could be the Protein Science journal web-service as well as the HighWire Press web-site.
  • The requester: It is helpful to think about the requester as the user who clicks an OpenResolutionLink in order to request services. A descriptor for the user in the ContextObject, could be based upon the user's e-mail address. That descriptor would be of the entity-id type. It would be the combination of an identifier of the namespace of all e-mail addresses -- e.g., mailto -- and the e-mail address itself, e.g., herbertv@cs.cornell.edu. A descriptor for this user could also be built around an entry in an LDAP directory. In this case, the descriptor could be of the metadata-description-pointer type, consisting of the combination of an identifier for a metadata scheme -- e.g., inetperson.org, a common LDAP scheme -- and a pointer into an LDAP directory that uses the scheme, e.g., ldap://ldap.cs.cornell.edu:389/herbertv.
  • The serviceType: The serviceType is introduced to enable identifying the type of service requested upon resolution of the referent's descriptor. Because of issues involved in uniquely describing a type of service, it may prove difficult to assign identifiers to serviceTypes that are understood on a global scale. Still, within a controlled environment it may be possible to assign local identifiers that could be used to build descriptors. For instance, local:scholarly-services could be an identifier of a local namespace, within which the identifier searchweb would be understood to refer to a service that searches for the significant words of the referent's title in web search engines.
Example 2: An illustration of the notion of a ContextObject. The referent is the article referenced in Table 2.
spacer

 

Regarding the above:

  • The notion of the ContextObject is introduced in order to allow referencing a referent as well as the context in which the referent is referenced by means of a single construct. Within that structure, every entity (referent as well as other contextual entities) is described by means of a descriptor. As a result, the purpose of the ContextObject is to make metadata and identifiers of the referent as well as of contextual entities available, either in an explicit or implicit manner.
  • It is assumed that data of an administrative nature -- such as timestamps, versioning, etc. -- may be required in an actual deployment of ContextObjects. This possible requirement is not addressed in this paper.
  • It is conceivable that certain formats for encoding ContextObjects will allow for the inclusion of multiple referents, resolvers, etc.
  • No assumptions are made regarding how the ContextObject is assembled, nor by which party or parties. As a matter of fact, the ContextObject can be assembled in a phased manner, by the referrer, intermediate systems, a user's browser, etc. However, since the Bison-Futé model builds on the collaboration of authors of web-documents, it is assumed that the referrer is the web-service that collaborates with the Bison-Futé model by providing a ContextObject that contains at least a descriptor for the referent.
  • No assumptions are made regarding how a ContextObject is encoded into an HTTP request nor by which party. For instance, the referrer, an intermediate system, the user's browser, etc. could take on that task.

 

ContextObject and its six entitities
Figure 2: A ContextObject can contain 6 entities; each entity is specified by a descriptor; a descriptor is compounded from one or more of the 5 descriptor-types.
spacer

 

Resolver, Service

A resolver is a web-service that can take an OpenResolutionLink as input and deliver services related to the OpenResolutionLink's referent(s) as output. As such, it resolves the descriptor(s) of the referent(s), in the context of the other descriptors that are provided on the OpenResolutionLink.

For the purpose of this paper, the notion of service is left undefined apart from it being the reply of a resolver to a resolution request. One can imagine that the Bison-Futé model could function in such a mode, leaving it up to resolvers to decide on what they consider to be a service. Actually, that is the way the OpenURL framework currently functions. One can also imagine defining the replies to a resolution request in a formal manner. In this case an OpenResolutionLink encoding scheme might have to move into the realm of a protocol definition. This could, however, restrict applicability.

Encoding descriptors, ContextObjects, OpenResolutionLinks of the Bison-Futé model

The above description of the Bison-Futé model and its concepts is abstract, and a concrete instantiation may help convey a better understanding. Interested readers are encouraged to explore how some adjustments to the existing OpenURL draft specification [Van de Sompel, Hochstenbach, Beit-Arie 2000] could result in it becoming aligned with the generalized concepts, with the OpenURL becoming a specific technique to encode OpenResolutionLinks for the scholarly environment. It is interesting to note that this encoding technique would be one in which much of the richness available in the Bison-Futé model is being stripped off in order to achieve a fair level of simplicity. As a matter of fact, it can be seen that in the current OpenURL specifications, the referent is specified in the OBJECT-DESCRIPTION, the referrer is specified in the ORIGIN-DESCRIPTION and the resolver is specified in the BASE-URL. Appendix C shows the entities of the ContextObject currently available in the OpenURL specification. It also shows which of the descriptor-types are currently used in the descriptors of each of these entities.

Below, another scenario is presented in which a richer encoding scheme for descriptors, ContextObjects and OpenResolutionLinks is used. The scenario builds on an example taken from the scholarly environment, but it should be clear that it could be applied to other types of references too.

Table 4 shows an excerpt of an HTML document referencing the article that has been used in all examples so far.


<p>
Moll JR, Olive & M, Vinson C. Attractive interhelical electrostatic interactions in the proline- and acidic-rich region (PAR) leucine zipper subfamily preclude heterodimerization with other basic leucine zipper subfamilies. J Biol Chem. 2000 Nov 3 ; 275(44):34826-32. <a href=http://dx.doi.org/10.1074/jbc.M004545200>full text</a>
</p>
Table 4: An excerpt from an HTML document containing a reference to the article of Table 2.

Table 5 shows descriptors for the above reference, expressed according to a rudimentary descriptor format for which an XML Schema is provided in Appendix A. No claims are made regarding the correctness/applicability of that Schema. The Schema is provided only to explain the concept of a descriptor and a descriptor format.

 

<descriptor>
 <entity-id>
   <namespace-identifier>doi</namespace-identifier>
   <identifier>10.1074/jbc.M004545200</identifier>
 </entity-id>
</descriptor>

(Hereby the assumption is made that a maintenance agency publicly records the correspondence between the doi namespace-identifier and the DOI namespace).
<descriptor>
 <metadata-description>
  <metadata-format-identifier>openurl</metadata-format-identifier>
   <metadata>
    <aulast>Moll</aulast>
    <auinit>JR</auinit>
    <issn>0021-9258</issn>
    <volume>275</volume>
    <issue>44</issue>
    <spage>34826</spage>
    <date>2000-11-03</date>
   </metadata>
 </metadata-description>
 <metadata-id>
 <metadata-namespace-identifier>pmid</metadata-namespace-identifier>
 <metadata-identifier>10942764</metadata-identifier>
 </metadata-id>
</descriptor>

(Hereby the assumption is made that a maintenance agency publicly records the correspondence between openurl and http://www.sfxit.com/openurl/openurl.html as well as between pmid and the namespace of PubMed identifiers.)
<descriptor>
 <metadata-description-pointer>
   <metadata-format-identifier>PubMedSGML</metadata-format-identifier>
  <metadata-pointer>http://www.ncbi.nlm.nih.gov/
   entrez/query.fcgi?cmd=Retrieve&
    db=PubMed&list_uids=10942764&dopt=SGML
  </metadata-pointer >
 </metadata-description-pointer>
</descriptor>

(Hereby the assumption is made that a maintenance agency publicly records the correspondence between PubMedSGML and http://www.ncbi.nlm.nih.gov/entrez/query/static/PubMed.dtd)
Table 5: Examples of descriptors for the article referenced in Table 2.
spacer

 

Appendix B shows a rudimentary format to encode ContextObjects building on the descriptor format of Appendix A. No claims are made regarding the correctness/applicability of that Schema. The Schema is provided only to explain the concept of a ContextObject and a format to encode ContextObjects.

Table 6 shows the excerpt of the HTML document, in which a collaborating information service (i.e., a referrer) has provided a minimal ContextObject. The ContextObject is expressed by means of the formats of Appendix A and Appendix B, and is introduced in the HTML document, following the reference. Note that the ContextObject is not delivered in a clickable way: it is not yet an OpenResolutionLink.

 


<p>
Moll JR, Olive & M, Vinson C. Attractive interhelical electrostatic interactions in the proline- and acidic-rich region (PAR) leucine zipper subfamily preclude heterodimerization with other basic leucine zipper subfamilies. J Biol Chem. 2000 Nov 3 ; 275(44):34826-32. <a href="http://dx.doi.org/10.1074/jbc.M004545200">full text</a>
<ContextObject><referent-block><referent><entity-id>
<namespace-identifier>doi</namespace-identifier>
<identifier>
10.1074/jbc.M004545200</identifier></referent>
</referent-block></ContextObject>

</p>
Table 6: An HTML document with a ContextObject provided by a collaborating web-service (referrer).
spacer

 

In this scenario, it is assumed that the user's browser can call upon a helper application that makes ContextObjects found in HTML pages actionable. Experiments with such a helper application are currently under way. The experimental application allows a user to configure a list of preferred resolvers, an image or words that should be used as anchor for OpenResolutionLinks, preferences regarding the screen that should be opened upon clicking an OpenResolutionLink, etc.

 


<p>
Moll JR, Olive & M, Vinson C. Attractive interhelical electrostatic interactions in the proline- and acidic-rich region (PAR) leucine zipper subfamily preclude heterodimerization with other basic leucine zipper subfamilies. J Biol Chem. 2000 Nov 3 ; 275(44):34826-32. <a href=http://dx.doi.org/10.1074/jbc.M004545200>full text</a>
<form name="ContextObject_1" action="http://sfx1.exlibris-usa.com/demo" method="POST" target="_new"><input type="hidden" name="OpenResolutionLink" value="
<?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;>
<ContextObject>
<header>
<resolver>
    <entity-id>
     <namespace-identifier>http</namespace-identifier>
     <identifier>sfx1.exlibris-usa.com/demo</identifier>
</entity-id>
</resolver>
<resolver>
<entity-id>
     <namespace-identifier>http</namespace-identifier>
     <identifier>sfx.rug.ac.be/menu</identifier>
</entity-id>
</resolver>
<requester>
<entity-id>
    <namespace-identifier>mailto</namespace-identifier>
    <identifier>herbertv@cs.cornell.edu</identifier>
</entity-id>
<metadata-description-pointer>
    <metadata-format-identifier>inetorgperson.schema</metadata-format-identifier>
    <metadata-pointer>ldap://ldap.cs.cornell.edu:389/herbertv</metadata-pointer>
   </metadata-description-pointer>
  </requester>
</header>
<referent-block>
<referent>
<entity-id>
     <namespace-identifier>doi</namespace-identifier>
     <identifier>10.1074/jbc.M004545200</identifier>
</entity-id>
  </referent>
</referent-block>
</ContextObject>" />
<a href="javascript:document.forms.ContextObject_1.submit();">more services</a>
</form>

</p>

Table 7: HTML excerpt with an OpenResolutionLink.
spacer

 

Table 7 shows the HTML excerpt after the intervention of the helper application. As can be seen, the helper has extended the ContextObject and has turned it into an OpenResolutionLink. The OpenResolutionLink is structured as an HTTP POST targeted at a resolver which is at http://sfx1.exlibris-usa.com/demo. The message body delivered in the POST is an XML document that complies with the XML Schema of Appendix B. As can be seen, the helper application has added some contextual descriptors to the ContextObject. It has included the address of two resolvers that were configured in the helper, allowing the resolver that is initially being targeted -- the first one in the ContextObject -- to forward the request for resolution of the referent's descriptor to another resolver if required. That might, for instance, be the case if the initial resolver is not able to resolve descriptors of the doi namespace. The helper application has also added information regarding the requester by including an e-mail address as well as an LDAP URL [Howes and Smith 1996].

Conclusion

In this paper, concepts have been introduced that may form the basis of a generalization of the OpenURL framework beyond references to scholarly works. In doing so, the paper tries to meet the desire expressed in the NISO AX charge document "to keep in mind other information communities where the generic mechanism for making identifiers and metadata available to service components may be applicable". The generalization described under the umbrella of the Bison-Futé model extends the notion of open and context-sensitive linking to references on the web in general.

The introduction of the Bison-Futé concepts should not be interpreted as a proposal to start the standardization process from scratch. Quite to the contrary, the aim is to propose an abstract common framework within which the standardization process could flourish. The authors emphasize that for the actual standardization process, the following cannot be ignored:

  • The existence of the OpenURL draft specification;
  • The fact that the draft specifications are already deployed in the scholarly information industry;
  • The fact that the high level of acceptance and adoption of the current draft OpenURL specifications is most likely related to its ease of implementation;
  • The goal of NISO AX to release a draft OpenURL standard applicable in the area of open reference linking for the scholarly information environment in a relatively short timeframe;
  • The perceived desire in the information industry to not introduce dramatic changes to the existing draft specifications in order to guarantee continuity.

Hence, the Bison-Futé model -- as put forth in this paper -- should be viewed as an architectural plan for the construction of a large house with the OpenURL framework being one of its many rooms. The authors suggest that the NISO AX committee will focus on designing and constructing that room in accordance with the architectural plan of the house.

The authors hope that the model will inspire other information communities to explore the potential of open linking in practice. The great enthusiasm regarding the OpenURL framework in the scholarly information environment should serve as an encouragement to do so. The authors also hope that this paper will provide inspiration for research that may eventually lead to synergy between the linking efforts in the digital library community and the other communities that are currently working on related efforts.

References

Berners-Lee, Tim, L. Masinter, and M. McCahill. 1994. RFC1738: Uniform Resource Locators (URL). <http://search.ietf.org/rfc/rfc1738.txt?number=1738>.

Berners-Lee, Tim, James Hendler and Ora Lassila. 2001. "The Semantic Web." Scientific American. May 2001. (URL). <http://www.sciam.com/2001/0501issue/0501berners-lee.html>.

Caplan, Priscilla and Arms, William Y. 1999. "Reference linking for journal articles." D-Lib Magazine. 5(7/8). <http://www.dlib.org/dlib/july99/caplan/07caplan.html>.

Carr, Leslie, Wendy Hall, Sean Bechofer and Carole Goble. 2001. "Conceptual inking: ontology-based open hypermedia." Tenth International World Wide Web Conference. May 1-5 2001, Hong Kong. <http://www10.org/cdrom/papers/pdf/p246.pdf>.

Curtin, Matt, Gary Ellison, and Doug Monroe. 1998. "What's Related?" Everything but your privacy. <http://www.interhack.net/pubs/whatsrelated/>.

Gronbak, Kaj, Niels Olof Bouvin, and Lennert Sloth. "Designing Dexter-Based Hypermedia Services for the World Wide Web." Proceedings of the eighth ACM conference on Hypertext. April 6 - 11, 1997, Southampton United Kingdom. ACM, p. 146-56. <http://www.acm.org/pubs/citations/proceedings/hypertext/267437/p146-gronbaek/>.

Halasz, F. and M. Schwartz. 1994. "The Dexter Hypertext Reference Model." Communications of the ACM 37, no. 2 (1994): p. 30-39. <http://www.acm.org/pubs/citations/journals/cacm/1994-37-2/p30-halasz/>.

Hitchcock, Steve and Wendy Hall. 2001. "How dynamic e-journals can interconnect open access archives." Paper prepared for ElPub conference, Canterbury, July 2001. <http://www.ecs.soton.ac.uk/~sh94r/elpub01.pdf>.

Howes, T and M. Smith. 1996. RFC1959: An LDAP URL Format. <http://www.ietf.org/rfc/rfc1959.txt?number=1959>.

Lawrence, Steve and others. 2001. "Persistence of Web References in Scientific Research." Computer. 34(2). p. 26-31. <http://ieeexplore.ieee.org/iel5/2/19496/00901164.pdf>.

Phelps, Thomas A. and Robert Wilensky. 2000. "Robust Hyperlinks: Cheap, Everywhere, Now ." Lecture Notes in Computer Science. Proceedings of Digital Documents and Electronic Publishing, Munich, Germany, 13-15 September 2000. <http://www.cs.berkeley.edu/~phelps/Robust/papers/RobustHyperlinks.html>.

Van de Sompel, Herbert and Oren Beit-Arie. 2001. "Open Linking in the Scholarly Information Environment Using the OpenURL Framework." D-Lib Magazine. 7(3). <http://www.dlib.org/dlib/march01/vandesompel/03vandesompel.html>.

Van de Sompel, Herbert and Patrick Hochstenbach. 1999a. "Reference Linking in a Hybrid Library Environment. Part 1: Frameworks for Linking." D-Lib Magazine. 5(4). <http://www.dlib.org/dlib/april99/van_de_sompel/04van_de_sompel-pt1.html>.

Van de Sompel, Herbert and Patrick Hochstenbach. 1999b. "Reference Linking in a Hybrid Library Environment. Part 2: SFX, a Generic Linking Solution." D-Lib Magazine. 5(4). <http://www.dlib.org/dlib/april99/van_de_sompel/04van_de_sompel-pt2.html>.

Van de Sompel, Herbert and Patrick Hochstenbach. 1999c. "Reference Linking in a Hybrid Library Environment. Part 3: Generalizing the SFX Solution in the "SFX@Ghent & SFX@LANL" experiment." D-Lib Magazine. 5(10). <http://www.dlib.org/dlib/october99/van_de_sompel/10van_de_sompel.html>.

Van de Sompel, Herbert, Patrick Hochstenbach, and Oren Beit-Arie. May 2000. OpenURL Syntax Description. <http://www.sfxit.com/openurl/openurl.html>.

Acknowledgments

Special thanks for significant contributions to the participants of the Chicago NISO AX sub-committee meeting (May 11th 2001): Tony Hammond, Larry Lannom and Oliver Pesch.

Many thanks for feedback and support to Donna Bergmark, Les Carr, Young Fan, Patrick Hochstenbach, Carl Lagoze, Michael Nelson, Jenny Walker and to all the participants at the CNRI NISO AX meeting (June 27-28, 2001): Ann Apps, Mary Alice Ball, Karen Coyle, Susan Devine, Todd Fegan, Eric Hellman, Tony Hammond, Larry Lannom, Justin Littman, Cliff Morgan, Mark Needleman, Eamonn Neylon, Phil Norman, Oliver Pesch, Harry Samuels and Eric Van de Velde.

Appendix A: A sample descriptor format

<schema xmlns="http://www.w3.org/2001/XMLSchema"
 xmlns:desc=" http://www.niso.org/descriptor_format"
 targetNamespace="http://www.niso.org/descriptor_format"
 elementFormDefault="qualified"
 attributeFormDefault="unqualified">

<element name="descriptor" type="desc:descriptor-type"/>

<complexType name="descriptor-type">
<sequence>
<element name="entity-id" minOccurs="0" maxOccurs="unbounded" type="desc:entity-id-type"/>
<element name="metadata-id" minOccurs="0" maxOccurs="unbounded" type="desc:metadata-id-type"/>
<element name="metadata-description" minOccurs="0" maxOccurs="unbounded" type="desc:metadata-description-type"/>
<element name="metadata-description-pointer" minOccurs="0" maxOccurs="unbounded" type="desc:metadata-pointer-type"/>

<element name="private-zone" minOccurs="0" maxOccurs="unbounded" type="desc:private-zone-type"/>
</sequence>
</complexType>

<complexType name="entity-id-type">
<sequence>
<element name="namespace-identifier" minOccurs="1" maxOccurs="1" type="identifier-type"/>
<element name="identifier" minOccurs="1" maxOccurs="1" type="string"/>
</sequence>
</complexType>

<complexType name="metadata-id-type">
<sequence>
<element name="metadata-namespace-identifier" minOccurs="1" maxOccurs="1" type="identifier-type"/>
<element name="metadata-identifier" minOccurs="1" maxOccurs="1" type="string"/>
</sequence>
</complexType>

<complexType name="metadata-description-type">
<sequence>
<element name="metadata-format-identifier" minOccurs="1" maxOccurs="1" type="identifier-type"/>
<element name="metadata" minOccurs="1" maxOccurs="1" type="metadata-type"/>
</sequence>
</complexType>

<complexType name="metadata-pointer-type">
<sequence>
<element name="metadata-format-identifier" minOccurs="1" maxOccurs="1" type="identifier-type"/>
<element name="metadata-pointer" minOccurs="1" maxOccurs="1" type="anyURI"/>
</sequence>
</complexType>


<complexType name="private-zone-type">
<sequence>
<any namespace="##other" processContents="lax"/>
</sequence>
</complexType>

<simpleType name="identifier-type">
<restriction base="string">
<pattern value="[a-zA-Z0-9]+"/>
<element name="metadata" minOccurs="1" maxOccurs="1" type="metadata-type"/>
</restriction>
</simpleType>

<complexType name="metadata-type">
<sequence>
<any namespace="##other" processContents="lax"/>
</sequence>
</complexType>

</schema>

Appendix B: A sample ContextObject format

<schema xmlns="http://www.w3.org/2001/XMLSchema"
 xmlns:desc=" http://www.niso.org/descriptor_format"
 xmlns:cont=" http://www.niso.org/contextobject_format"
 targetNamespace="http://www.niso.org/contextobject_format"
 elementFormDefault="qualified"
 attributeFormDefault="unqualified">

<element name="ContextObject" type="cont:ContextObject-type"/>

<complexType name="ContextObject-type">
<sequence>
<element name="header" minOccurs="0" maxOccurs="1" type="cont:header-type"/>
<element name="referent-block" minOccurs="1" maxOccurs="unbounded" type="cont:referent-block-type"/>
</sequence>
</complexType>

<complexType name="header-type">
<sequence>
<element name="resolver" minOccurs="0" maxOccurs="unbounded" type="desc:descriptor-type"/>
<element name="requester" minOccurs="0" maxOccurs="1" type="desc:descriptor-type"/>
</sequence>
</complexType>

<complexType name="referent-block-type">
<sequence>
<element name="referent" minOccurs="1" maxOccurs="1" type="desc:descriptor-type"/>
<element name="referrer" minOccurs="0" maxOccurs="1" type="desc:descriptor-type"/>
<element name="referring-entity" minOccurs="0" maxOccurs="1" type="desc:descriptor-type"/>
<element name="serviceType" minOccurs="0" maxOccurs="1" type="desc:descriptor-type"/>
</sequence>
</complexType>

</schema>

Appendix C : Relationship between the current OpenURL specifications and the ContextObject notion of the Bison-Futé model

 

ContextObject
entity
available in OpenURL?
1 or more
descriptor
entity-id
metadata-id
metadata-desc
metadata-desc-ptr
private-zone
           
referent
yes (OBJECT-DESCRIPTION)
more
yes (GLOBAL-IDENTIFIER-ZONE). no distinction between entity-id and metadata-id
yes (OBJECT-METADATA-ZONE). single metadata scheme
no
yes (LOCAL-IDENTIFIER-ZONE)
resolver
yes (BASE-URL)
1
yes (BASE-URL)
-
-
-
-
referrer
yes (ORIGIN-DESCRIPTION)
1
yes (ORIGIN-DESCRIPTION)
-
-
-
-
referring entity
no
-
-
-
-
-
-
requester
no
-
-
-
-
-
(*)
serviceType
no
-
-
-
-
-
(*)
spacer

 

(*) Although the notion of requester and serviceType are not explicitely available in the exisiting OpenURL draft specifications, its local-identifier-zone has been (ab)used to contain such information.

Copyright 2001 Herbert Van de Sompel and Oren Beit-Arie
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Back Issues
Editorial | Opinion | Next Article
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/july2001-vandesompel