Stories

D-Lib Magazine
April 1999

Volume 5 Issue 4
ISSN 1082-9873

Reference Linking in a Hybrid Library Environment

Part 2: SFX, a Generic Linking Solution

blue line

Herbert Van de Sompel
[email protected]

Patrick Hochstenbach
[email protected]

Automation Department of the Central Library of the University of Ghent, Belgium

Abstract

This is the second part of two articles about reference linking in hybrid digital libraries. The first part, Frameworks for Linking described the current state-of-the-art and contrasted various approaches to the problem. It identified static and dynamic linking solutions, as well as open and closed linking frameworks. It also included an extensive bibliography. The second part describes our work at the University of Ghent to address these issues. SFX is a generic linking system that we have developed for our own needs, but its underlying concepts can be applied in a wide range of digital libraries.

SFX linking

This is a description of the approach to the creation of extended services in a hybrid library environment that has been taken by the Library Automation team at the University of Ghent. The ongoing research has been grouped under the working title Special Effects (SFX). In order to explain the SFX-concepts in a comprehensive way, the discussion will start with a brief description of pre-SFX experiments. Thereafter, the basics of the SFX-approach are explained briefly, in combination with concrete implementation choices taken for the Elektron SFX-linking experiment. Elektron was the name of a modest digital library collaboration between the Universities of Ghent, Louvain and Antwerp.

The SFX working environment

The University of Ghent subscribes to a wide variety of electronic information services. They include SilverPlatter’s Electronic Reference Library solution (ERL) and ExLibris’s Aleph 500 Integrated Library System both of which are important local building blocks. The ERL server hosts a wide variety of mainly secondary data (70+ Gb), while the Aleph system hosts the local catalog (500,000+ bibliographic records). Recently, ISI’s Web of Science has been added. The environment also provides access to a collection of about 300 e-editions of scientific journals that are available without additional charge as part of the institutional paper-based subscription. Amongst those, the Springer, Wiley, HighWire, Institute of Physics and American Physical Society collections are the most noteworthy. For the Elektron SFX-experiment described below, temporary access to the Academic Press, the UMI Business Periodicals Online and the Blackwell Science collections was granted.

The environment is presented to end-users via a web-based menu-system called the Executive Lounge, which is an easy-to-use interface to the database of databases (Figure 1). The Executive Lounge menu items point at both the traditional library related sources (typically networked databases and full-text collections) and a limited number of websites with academic relevance. Upon a user's request, menu items can be presented in different views: by data-type (secondary sources, catalogs, primary sources); by discipline (humanities, medicine, engineering, etc.); via a menu item search screen; via a display presenting only menu items that can be searched simultaneously. For instance, in the data-type view, the menu-header "secondary sources" gives access to Current Contents as well as to the major Internet search engines. A reference to most of the e-Lib subject-based gateways will be found under the same header. The menu-header "catalogues" points at several important Belgian library catalogs, as well as at a catalog of electronic journals and important Internet bookstores. The menu-header "primary sources" points at established publishers’ e-editions as well as at a selection of free Internet e-journals.

Figure 1: the Executive Lounge interface

Pre SFX-linking experience

The Ghent library automation group has been actively involved in reference linking for several years:

SFX-concepts

The aim of SFX is to provide extended services in the hybrid library environment. The goal is to present information to the user in the context of the entire collection that is available. As discussed in Part 1 of this article, the target(s) of a link is/are seen as a combination of the information provider’s and the libraries’ intentions and does not rely on a static database of links that are computed in advance.

An overview of the design is shown in Figure 2 and is expanded in the following sections.

Figure 2: the SFX mechanism

SFX, linking from … to …

A reference link is from one item of information to another. In the following, the term "link-source" will refer to the information unit for which links need to be provided. Link-sources can be records from OPAC systems, from abstracting and indexing databases (A&I), the bibliographic information of a full-text paper as well as each of its citations.

So far, SFX-experiments have concentrated on the link-sources that are shown in Table 1. This set of link-sources has been chosen for research because it contains link-types that have hardly been investigated, but also because it restricts the problem to systems where the link-sources are under local control. Although this choice might seem to limit the scope of the research, it has allowed the development of solutions to grab link-sources other than proxying, and to concentrate on other aspects of linking that are equally important.

SOURCE

secondary database

yes

yes

yes

yes

OPAC

yes

yes

yes

yes

primary collection

no

no

no

no

other web info

no

no

no

no

secondary

database

OPAC

primary

collection

other

web info

TARGET

Table 1: SFX linking from-to

The Colli: a collection of anticipated conceptual links

Static linking solutions are not considered in SFX-linking. Therefore, there is no database containing hardwired links between the data that is involved. Instead, there is a collection of anticipated conceptual links (Colli) that the hybrid library wants to make available to its users. The organization of the Colli is based on the feasibility of actually creating the link at some further stage in the process (i.e. existence of a link-to-service) and in anticipation of users' expectations. Each of the conceptual links is introduced in order to provide a certain service that is thought to be valuable for users of the system.

Each of the anticipated links in the Colli is accorded a name that corresponds to a procedure designed to resolve the link-to-syntax using parameters extracted from the link-source. The links that have been introduced for the Elektron experiment, are shown in Table 2.

There are 3 links to OPAC systems that are important for interlibrary loan: the Ghent Aleph 500 system, the Belgian union catalogue of serials, and the Dobis/Libis system at the University of Louvain. There are links to secondary databases, such as L-BIP, which is intended to look for the record in Books in Print that corresponds to the link-source. L-ULRICH is similar, with links into Ulrich’s Serials Directory. L-JCR looks up Journal Citation Reports data for the link-source; thus it provides the user with ISI’s notion of the quality of the referenced journal. L-CC is intended to bring up the table of contents (including abstracts) from the Current Contents database, for the issue of the journal that is referred to by the link-source. There are several links to primary information collections, whose names are self-explanatory. Finally, the L-AMAZON link leaves the typical academic information environment, and searches for the book referred to by the link-source, in order to present the user with book reviews and ordering information.

All conceptual links and related procedures are seen as being independent of:

the COLLI

type

link name

links to

to OPAC systems

L-ALEPH

University of Ghent OPAC

L-ANTILOPE

Belgian union catalogue of serials

L-LIBIS

University of Louvain OPAC

to secondary databases

L-BIP

Books in Print

L-ULRICH

Ulrich’s International Periodicals Directory

L-JCR

ISI’s Journal Citation Reports

L-CC

ISI’s Current Contents

to primary information

L-SWETS

SwetsNet collection

L-SPRINGER

Springer full-text collection

L-ACADEMIC

Academic Press full-text collection

L-BPO

UMI Business Periodicals Online collection

to others

L-AMAZON

Amazon.com online bookstore

Table 2: the Colli in the Elektron SFX-experiment

The SFX-button: just-in-time linking

SFX takes a "just-in-time" instead of a "just-in-case" approach to linking. When information is presented to the user, potential link-sources are marked with an SFX button. As a means of reducing delays, no links are computed until requested by the user. For each link-source, an identifier is hidden behind an SFX-button (see I in Figure 2). This identifier holds the following information:

A user must explicitly request links for a link-source by clicking the SFX-button. Clicking transfers the identifier to the local target that uses it to pull the link-source into its environment (see II in Figure 2). The ID of the server not only gives information on its location, but also on the protocol to be used to grab the link-source. In the case of OPAC or Abstracting and Indexing databases, this might be Z39.50. But it might also be a Lightweight Directory Assistance Protocol (LDAP) look up, a Handle resolution, or an http link. Next, the document is parsed into a generic format and essential parameters are extracted (see III in Figure 2). All information is kept at the server-side, in relation to the users’ session-ID. The system is now ready to start the next phase in the process: the conceptual verification of potential links from the Colli.

The "just-in-time" approach, requiring an explicit user action to request links, seems to be justified by the following:

SFX-linking approaches the problem of grabbing the link-source by introducing a clickable identifier, containing a small data record, for each link-source. The technique is identical for all systems involved. It is recognized that the implementation of this solution was simplified by the fact that the originating servers used in the experiments were under local control. Both providers of the local systems in Ghent -- ExLibris and SilverPlatter -- have enabled its straightforward realization. Still, the concept is quite generic, and could also be implemented with systems under remote control, to create a general purpose "just-in-time" linking solution.

For instance, in the case of the Open Journal Project, journal papers are proxied and parsed before delivery to the user. There, many of the complexities involved with linking from citations could be postponed to a later phase in the process, by initially only identifying link-sources in the HTML or PDF documents and inserting, respectively, SFX-anchors or SFX-named-destinations as unique link-source IDs. Storing the enhanced document in the server environment and simultaneously sending it to the user would create a set-up in which the link-source could be retrieved and processed only upon the user’s request.

Proxying should be considered to be the hard way to grab the link-source. It is obvious that cooperation of the authority can lead to more straightforward solutions to grab the link source. One can imagine a situation where the authority inserts the required identifier along with the appropriate address of an institutional SFX-server on a subscription basis. Although this might sound like wishful thinking, such a possibility is almost inherent to the DOI concept, on the condition that:

Under these conditions, an institutional SFX-server could retrieve the link-source from a DOI directory.

Conceptual verification of links from the Colli via the SFX-base

Since there are no "a priori" computed links in this environment, there is no initial certainty on the relevance of a specific conceptual link from the Colli for a specific link-source. Meanwhile, that link-source resides in a parsed format in the server’s environment (see III in Figure 2). In order to prevent irrelevant links from being presented to the user, the SFX-base is introduced (see IV in Figure 2). The SFX-base describes the relationship between the conceptual links from the Colli and the parameter values of link-sources for which the conceptual links are valid. Matching parameters of a link-source with the SFX-base filters out irrelevant links. The matching process fulfills a conceptual verification for each of the links from the Colli. Once a link has been selected in this process, it will be included in the bundle of links that will be presented to the user (see V in Figure 2).

It should be emphasized that this selection does not guarantee the success involved in following the link, at a later stage. The conceptual verification minimizes the amount of predictable failures. For instance, when the active document refers to a journal article, the anticipated link to Amazon.com will be filtered out. When the active document originates from the Current Contents database or when the journal referred to by the active document is not indexed in Current Contents, the L-CC link will receive a negative flag. A link to Springer will only be selected when the active document refers to a paper published in a Springer journal with a publication year that makes electronic availability near to certain.

A limited number of parameters have been defined for the SFX-base of the Elektron experiment:

This information is brought together in a relational database, with the Colli as a central table (Figure 3). In addition to the described parameters, a link-type table is added to the SFX-base, which allows the presentation of the relevant links in a structured way, corresponding to the classification made in Table 2, reflecting the organization of the database of databases in the Executive Lounge menu-system (Figure 1).

Figure 3: the SFX-base

A simplified overview of the contents of the SFX-base used in the Elektron experiment is given in Table 3. It is obvious that the design of the SFX-base requires fine-tuning in order to become a production system, but for an experimental set-up, a certain roughness has been tolerated:

link name

material type

threshold year

source

dbase id

ISSN

 

L-ALEPH

all

all

all except ALEPH

all

L-ANTILOPE

serials

all

all except ANTILOPE

all

L-LIBIS

all

all

all except LIBIS

all

L-BIP

books

> 1970

all except BIP

none

L-ULRICH

serials

all

all except ULRICH

all

L-JCR

serials

all

all

Only journals evaluated in JCR
L-CC

serials

>= 1996

all except CC

only journals abstracted in CC
L-SWETS

serials

>= 1997

all

ISSN numbers of Blackwell journals
L-SPRINGER

serials

>= 1997

all

ISSN numbers of Springer journals
L-ACADEMIC

serials

>= 1996

all

ISSN numbers of Academic Press journals
L-BPO

serials

>= 1997

all

ISSN numbers of UMI BPO journals
L-AMAZON

books

> 1970

all

none

Table 3: content of the SFX-base

The SFX-screen: a bundle of unresolved, functionally unverified links

The result of the conceptual verification process is a buffer of potential link-names, corresponding to links from the Colli that are relevant for the current link-source. The link-names in the buffer are organized in accordance with the classification shown in Table 2, and delivered to the user in a separate browser window (see VI in Figure 2 ; see Figure 5 , Figure 7 and Figure 9). Following the same argument that led to justification of just-in-time linking, at this stage links are still not resolved. The potential links are sent to the user, with the link procedure names as parameters. A server based link-resolution process that will be activated when the user chooses to follow a certain link will use these parameters. At that point, the essential information from the actual document is retrieved from the copy of the link-source that is held at the server’s side. Next, this information is fed to the chosen procedure in order to resolve the link (see VII in Figure 2). Finally, the user is redirected to the appropriate location (see VIII in Figure 2 ; Figure 6, Figure 8 and Figure 10).

Figure 4: an OPAC serials record

phd-sfx-img5n.gif (20846 bytes)

Figure 5: the SFX screen for the OPAC serials record

As a consequence of this approach, the links in the SFX-screen are not functionally verified, and following them may lead to empty results. This design option is subject to some considerations:

Exploiting this approach and properly designing the procedures to resolve the links can lead to features that are appealing instead of frustrating to end-users, as can be seen from the following examples:

Figure 6: a SFX-link to Current Contents for the OPAC serials record

phd-sfx-img7n.gif (18676 bytes)

Figure 7: the SFX screen for an OPAC book record

Figure 8: the SFX-link to Amazon.com followed for the OPAC book record

Figure 9: the SFX screen for a record from the EconLit database

Figure 10: the SFX-link to Journal Citation Reports followed for the record from EconLit

Conclusion

Part 1 of this article discussed a framework for the field of reference linking. SFX was presented as an example of how these concepts could be realized in an actual hybrid library. This second part has gone into greater detail in describing the prototype implementation of SFX. This prototype has demonstrated that dynamic linking is a practical and flexible option. It has also shown that the uncertainty inherent in dynamic linking is not necessarily a disadvantage, so long as it is recognized by the user interface. In a large-scale digital library, reference linking can never expect to be fully determined. SFX, with its just-in-time approach to dynamic linking, provides a straightforward, scalable alternative.

References

Bates, Marcia J. 1998. Indexing and access for digital libraries and the Internet: Human, database and domain factors. Journal of the American Society for Information Science 49, no. 13.

Hamilton, Feona J. 1998. Multi-level linking technology by Swets. Information World Review, no. 142 (December).

Van de Sompel, Herbert. 1991. Heading towards an electronic library: location independent integration of electronic reference sources in library workstations. 10th Annual meeting of the Dobis/Libis User Group. Leuven: Dobis/Libis User Group Secretary.

Van de Sompel, Herbert. 1993. Optimalisatie van de konsultatieketen aan de Universiteit Gent. Bibliotheekkunde 51. Kris Clara and Julien Van Borm. Antwerpen: VVBAD.

Van de Sompel, Herbert. 1994. Technology and collaboration: creating an effective information environment in an academic context. Online Information 94. Proceedings of the 18th International Online Information Meeting. Oxford and New Jersey: Learned Information (Europe) Ltd.

Copyright © 1999 Herbert Van de Sompel and Patrick Hochstenbach

Top | Contents
Search | Author Index | Title Index | Monthly Issues
Previous story | Next story
Home | E-mail the Editor

D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/april99-van_de_sompel-pt2