Stories

D-Lib Magazine
April 1999

Volume 5 Issue 4
ISSN 1082-9873

Reference Linking in a Hybrid Library Environment

Part 1: Frameworks for Linking

blue line

Herbert Van de Sompel
[email protected]

Patrick Hochstenbach
[email protected]

Automation Department of the Central Library of the University of Ghent, Belgium

Abstract

The creation of services linking related information entities is an area that is attracting an ever increasing interest in the ongoing development of the World Wide Web in general, and of research-related information systems in particular. Currently, both practice and theory point at linking services as being a major domain for innovation enabled by digital communication of content. Publishers, subscription agents, researchers and libraries are all looking into ways to create added value by linking related information entities, as such presenting the information within a broader context estimated to be relevant to the users of the information.

This is the first of two articles in D-Lib Magazine on this topic. This first part describes the current state-of-the-art and contrasts various approaches to the problem. It identifies static and dynamic linking solutions as well as open and closed linking frameworks. It also includes an extensive bibliography. The second part, SFX, a Generic Linking Solution describes a system that we have developed for linking in a hybrid working environment.

Linking

The creation of services linking related information entities is an area that is attracting an ever increasing interest in the ongoing development of the World Wide Web in general, and of research-related information systems in particular. Although most writings on electronic scientific communication have touted other benefits, such as the increase in communication speed, the possibility to exchange multimedia content and the absence of limitations on the length of research papers, currently both practice and theory point at linking services as being a major opportunity for improved communication of content. Publishers, subscription agents, researchers and libraries are all looking into ways to create added-value by linking related information entities, as such presenting the information within a broader context estimated to be relevant to the users of the information.

One of the first people to recognize this potential was Gardner. He expressed the desire to implement a hypertext structure linking scientific articles as a long-term goal of the electronic archive conceived by King and Roderer in 1978 (King and Roderer 1978), which he introduced to the psychology community more than a decade later (Gardner 1990). Hitchcock (Hitchcock et al. 1997a) relates the necessity of links to the associative modus operandi of the human mind. It comes as no surprise that both Gardner and Hitchcock refer to the historic writings by Vannevar Bush, in which he introduces the associative indexing (hypertext) Memex concept (Bush 1945).

But theoretical justification for linking information has become quite superfluous, since many practical illustrations of its importance have become available. Hitchcock attributes the explosive success of the World Wide Web to its linking possibilities (Hitchcock et al. 1997a). In the area of scholarly information, linking solutions have been introduced and have quickly become popular with their users. Initiatives by the Institute of Physics Publishing and BiomedNet spring to mind, where journal articles and their citations are being linked with the corresponding primary and secondary data. Ovid’s linking in its Biomedical collection, SilverPlatter’s SilverLinker, Links between articles in HighWire Press, and ISI’s Links in the Web of Science are other examples. The list of linking initiatives has grown rapidly, driven by expectations for a fully linked scholarly communication environment, created by these early linking-showcases.

Linking in library solutions

The necessity of linking

In the context of networked library services, the necessity to integrate secondary data, catalogues and primary information has been expressed quite some time ago (Evans et al. 1989; Van de Sompel 1991). More specifically, librarians have brought to the fore the need to link abstracting databases with library catalogues (Dempsey 1993; Dempsey 1995; Van de Sompel 1993); catalogues with primary information (Van de Sompel 1994); abstracting databases with full-text primary information (Arms 1993). These specific linking notions have evolved towards a concept of connecting all the available information, in order to come to a fully interlinked information environment (Van de Sompel 1997b). Lynch puts it this way (Lynch 1997):

Over time, the set of necessary linkages will expand to include not only A&I databases to primary content and serials holdings and serials holdings to primary content (or, more precisely, to navigational systems for cover-to-cover content of journals, including material not in the scope for the A&I databases), but also from (monographic) catalog bibliographic records to primary content (or to finding aids that assist in the navigation of large collections of primary content) and to secondary materials such as book reviews.

The omnipresence of the World Wide Web has raised users’ expectations in this regard. When using a library solution, the expectations of a net-traveler are inspired by his hyperlinked Web-experiences. To such a user, it is not comprehensible that secondary sources, catalogues and primary sources, that are logically related, are not functionally linked (Van de Sompel 1997a).

Once implemented, such library link services become popular with the target audience and turn out to be an important aspect of integrated library services. There are indications of a strong correlation between this satisfaction and the introduction of  linked electronic services. Caswell has shown this regarding the link between A&I databases and library catalogues (Caswell et al. 1995). Users’ reactions to the linking experiments in the Open Journals project -- where article citations and A&I databases have been linked -- were very positive overall (Hitchcock et al. 1998b). In a survey of library users at the Los Alamos National Laboratory 30 percent of the customers were ‘delighted’ and the majority of the remainder ‘satisfied’ with the highly linked library service (Weislogel 1998). And a public presentation of the link service described in Part 2 of this paper -- held on the occasion of the conclusion of the Flemish Elektron project (a very modest e-Lib look-alike) in December 1998 -- led to very positive feedback from the audience, again emphasizing the desire of users to work in a fully linked environment.

The actual situation

Static and dynamic linking approaches

Linking mechanisms that are in use or are being developed in the scholarly information environment, can be categorized as static or dynamic, depending on the architectural set-up of the information collection:

Given the requirement to control the information collection, in order to be able to interlink the information, the centralized commercial solutions are restricted by the sphere of influence of the information provider. Therefore, the creation of a fully interlinked information environment -- that would result in a true one-stop shop -- would require either an information monopoly or extensive partnerships. Although some publishers call for subject-driven cross-publisher information shops (Kierman 1998) with DOI as an enabling instrument, some industry observers see little tradition in the cooperation required for success. Therefore, the realization of a true one-stop shop under commercial control might not be a reachable goal. But if it is, it will most probably not come from an information monopoly that would support a static linking approach, except in narrowly defined fields. Logical behavior by companies in the information industry would normally prevent a broad monopoly from developing. A dynamic approach seems to be more likely.

In the non-commercial arena, the systems that make up hybrid library environments can be under local control, as is typically the case with OPAC and some secondary data systems. Alternatively, systems may also be under technical control of an external authority, such as a database vendor, a subscription agent, a publisher, and another library. The non-commercial parties -- libraries and consortia -- are in a much better position to build integrated services, since they are not copyright owners. As such, they are neutral enough to potentially receive a green light from a wide variety of information vendors, to integrate and interlink their data-collections. Therefore, the future reality of hybrid library systems will most probably exclude linking solutions that require the local availability of all data or even important parts of it. Hence, also in hybrid library environments, linking tends toward a dynamic approach.

Closed and open linking frameworks

The frameworks that have been introduced so far feed links based on the collection that the provider of the links -- henceforth referred to as the authority -- has within its reach, and leave no room for adaptation to the environment where the links are consumed. The linking frameworks can be called "closed." The following considerations apply for the closed linking approaches:

Such limitations cause serious problems. Most environments where links are consumed are hybrid libraries, made up of OPAC systems, abstracting databases, e-journals and e-editions as well as web-services. Some of the latter can hardly be classified using traditional library jargon. In this environment, a wide range of services -- that go beyond the initial aims or the possibilities of the authority -- can be delivered by creatively using the available information. The combination of an information unit that a user considers to be of interest and the entire collection that is accessible in the actual environment in which he operates can lead to the provision of a wide range of extended services for that information unit.

The authority can not anticipate the diversity of information that is available in the local environment. Thus, in order to deliver links that deal with the full richness of the information environment, the authority can not just autonomously define the target(s) of a link. Rather, linking should be seen as influenced by the environment where the link will be used. It should reflect a combination of the authorities’ and the consuming institutions’ intentions, ultimately even the users' goals.

Although these considerations apply to both commercial and non-commercial authorities, the hindrance resulting from closed linking frameworks is most significant with commercial services that follow a strategy of vertical integration that restrict the freedom to combine information from different vendors in the same environment. In a consortium environment some libraries rely on the hosting authority for all their library services, making the local environment the same as the authority's. As such, integration can fully be dealt with by the authority. But in some consortia, participating libraries may host some information locally that is not relevant to the entire consortium, but still want it to be integrated with the whole. The concrete examples below illustrate the problem. Most apply to commercial services:

The mainstream of the current linking approaches excludes the involvement of the consuming institution that is required to implement such services. The context of the environment in which the de-facto interlinked information is consumed is being ignored.

Design considerations

Given the increasingly distributed nature of the information collection at hand, a dynamic linking approach or at least some combination of static and dynamic linking might prove to be the most realistic path leading to a fully interlinked environment. The desire to act upon information units that are being provided by an authority calls for an open linking framework that is not in place. The alternative is to create extended services -- like the ones mentioned above -- using a dynamic linking approach. In the current closed linking context, this presents some important challenges:

But with many publishers that have online content, no such services are supported. Careful examination of their URL structures may lead to insights that can help when trying to link into their collections. Still, there is no overall uniformity in the approaches taken, and linking can become very complicated due to authentication issues, the level(s) of the links that can be created (journal level, publication year level, volume level, issue level, article level), the information required to create the links etc… Again, a generic framework, accepted by the scholarly publishing community would be most welcome. The SLinkS initiative (Hellman 1998) should be seen as a feasible proposal.

The feasibility of dynamic, open linking -- the SFX system

This paper has described the need for dynamic, open linking, but has not demonstrated that such systems are practical. Part 2 of this paper describes the SFX service for dynamic linking in a hybrid library. SFX presents a solution to interlink the available information entities in a hybrid library environment, without requiring "a priori" computation of links from the available data. The solution uses concepts drawn from the domain of linking services, without being one in the strict sense of the meaning.

In SFX, the notion of a database containing bundles of links in which each record represents an inter-relationship between documents -- as used in BiomedNet’s BundledLinks (Hitchcock et al. 1997b) (Figure 1) -- is replaced by a concept of potential inter-relationships between documents, expressed at the level of the databases from which they originate (Figure 2). The "a priori" computation of links -- as done in self-supporting environments such as BiomedNet -- is replaced by the "a posteriori" conceptual verification of links via the SFX-base, without any further functional verification. This results in a level of verification that lies between no verification, which is achieved when adding links blindly, and on the on-the-fly verification of links for every link-source (if that would be possible). The former requires little computing overhead but offers poor service; the latter offers perfect service, but causes significant delays (Hitchcock et al. 1997a). The proposed design achieves a balance between the extremes, through the introduction of the SFX-base that exploits know-how about the actual hybrid library environment in order to reduce both the amount of potential dead links and the required computing time. The more the SFX-base is fine-tuned, the more the risk of dead links can be reduced. Spreading the total required processing time over different phases further reduces delays.

Figure 1: document inter-relationships in BiomedNet's BundledLinks

Figure 2: potential document inter-relationships in SFX

Interpreting the SFX solution as a searching aid or as a provider of extended services helps to justify the lack of complete verification that can be expected from true linking services. Moreover, such an interpretation can lead to the inclusion of other types of links in the Colli, such as:

The main goals of the SFX-experiment were:

A recommendation

Straightforward progress in all three areas is highly dependent on the cooperation of the information industry. Many established players might be reluctant towards such an idea (Hitchcock et al. 1998b) since it requires far-reaching openness of their services. Proprietary solutions are part of a traditional strategy aiming at the minimization of competition (Porter 1979), and a revival of that marketing concept can be found in many parts of the information industry, where the battle for the one stop shop market has exploded. Linking is considered to be a very important matter by major players in the information industry. Elsevier’s Karen Hunter (Hunter 1998):

In 1996 I said: "One of the key roles a publisher should play in the future is creating links -- adding value by integrating information letting people maneuver through the space and get a full range of information." Amen. My current motto is "the publisher with the best links wins". I don’t lose sleep over this, but it’s a mantra that I keep repeating to all who will listen. No publisher is an island, no information cannot be improved by enriching its context. (Pardon the double negative).

In due time, services of such importance will be subject to differential price setting. Wittingly or unwittingly outsourcing such new information services to commercial parties will lead to a dependency on their integrated solutions. Outsourcing of scholarly publishing to commercial publishers has led to a pricing spiral (Bennett 1998). Although the literature is abundant about the serials crisis, the problem should not be seen as restricted to the area of the journal literature. At the core of the problem lies the notion of total dependency. It comes as no surprise, to find recent evidence of a sudden price increase with a factor of 3.5 for a commercial database service, after acquisition by a main commercial player in the information industry (Case 1998). A similar situation may lay ahead for linking services, since closed linking frameworks in the hands of commercial parties will make the academic community completely dependent on those solutions, leaving no room for hybrid libraries to act in this domain. Hunter’s quote not only stresses the importance of linking, it also calls for bridges between publishers, without mentioning libraries. This mirrors the observation that libraries are not involved in the DOI initiative (Scott 1998 ; International DOI Foundation 1999), although that might be due to their own lack of initiative.

This linking domain opens an opportunity for the subversive initiatives in the area of scholarly communication to become more widely accepted via an integration into library services. Kling and Covi have already brought to our attention that the marginal situation of e-journals (Harter and Kim 1996 ; Harter 1996) might be overcome by integrating those into the scholarly document system of libraries, indices and abstracting services (Kling and Covi 1995). As such, the adherence to an open framework for interlinking, that would enable libraries to deliver extended services for the alternative e-journals, might be part of the path leading to more general acceptance. A similar remark applies to the e-print servers, that turn out to be very successful in the intended user-community (Ginsparg 1994 ; Luzi 1998). Still, their integration into library services worldwide might be an impulse for a move from a successful subversive communication initiative to a wide-spread accepted publishing model.

Meanwhile, libraries should strive for an alteration of the linking frameworks into a direction that enables them to fully exploit the collection they access, acquire or build. The pursuit of the means that enable the creation of extended services, like the ones described here, should be high on the agenda of libraries worldwide. In the same manner as libraries are uniting in order to formulate guidelines for consortia deals (Turner and Yale University Library 1998), they should bring forward requirements for information systems that enable them to build and control extended services upon the information they license or acquire. At first sight, such services might look like just another bell or whistle for electronic library services. But as argued above, for once things are less innocent than they seem.

References

Arms, William Y. 1993. Keynote address: the virtual library. Networking and the future of libraries: Proceedings of the UK Office for library networking conference. London: Meckler.

Bates, Marcia J. 1998. Indexing and access for digital libraries and the Internet: Human, database and domain factors. Journal of the American Society for Information Science 49, no. 13.

Bennett, Douglas C., et al. 1998. To publish and perish. Policy Perspectives 7, no. 4.

Bide, Mark. 1997. In search of the Unicorn. London: Book Industry Communication, BNBRF 89.[http://www.bic.org.uk/bic/].

Bush, Vannevar. 1945. As we may think. Atlantic Monthly 176, no. 1 (July). [http://www.theatlantic.com/unbound/flashbks/computer/bushf.htm].

Carr, Leslie and others. 1995. The distributed link service: a tool for publishers, authors and readers. Proceedings of the fourth World Wide Web conference. [http://www.w3c.org/pub/Conferences/WWW4/Papers/178/].

Case, Mary M. 1998. ARL Promotes Competition through SPARC: The Scholarly Publishing & Academic Resources Coalition . ARL Newsletter, no. 196. [http://www.arl.org/newsltr/196/sparc.html].

Caswell, Jerry V. and others. 1995. Importance and use of holdings links between citation databases and online catalogs. The Journal of Academic Librarianship 21, no. 2.

Dempsey, Lorcan. 1993. The future of library systems: integrated or insulated? Networking and the future of libraries: Proceedings of the UK Office for library networking conference. London: Meckler.

Dempsey, Lorcan. 1995. The scandal of serials holding data. Catalogue & Index, no. 118.

Evans, Nancy H. and others. 1989. The vision of the electronic library. Mercury technical report series 1. Carnegie Mellon University.

Gardner, William. 1990. The electronic archive: scientific publishing for the 1990s. Psychological Science 1, no. 6.

Ginsparg, Paul. 1994. First steps towards electronic research communication. Computers in Physics 8, no. 4. [http://xxx.lanl.gov/blurb].

Hamilton, Feona J. 1998. Multi-level linking technology by Swets. Information World Review, no. 142 (December).

Harter, Stephen P. 1996. The impact of electronic journals on scholarly communication: a citation analysis. Public-Access Computer Systems Review 7, no. 5. [http://info.lib.uh.edu/pr/v7/n5/hart7n5.html].

Harter, Stephen P. and Hak Joon Kim. 1996. Electronic journals and scholarly communication: A citation and reference study. Proceedings of the midyear meeting of the American Society for Information Science, San Diego, CA. [http://php.indiana.edu/~harter/harter-asis96midyear.html].

Hellman, Eric. 1998. Scholarly Link Specification Framework (SLinkS). [http://www.openly.com/SLinkS/].

Hitchcock, Steve and others. 1997a. Citation linking: improving access to online journals. Proceedings of the 2nd ACM International Conference on Digital Libraries, New York, USA: Association for computing machinery. [http://journals.ecs.soton.ac.uk/acmdl97.htm].

Hitchcock, Steve and others. 1997b. Linking everything to everything: Journal publishing myth or reality? ICCC/IFIP conference on electronic publishing '97: New models and opportunities. [http://journals.ecs.soton.ac.uk/IFIP-ICCC97.html].

Hitchcock, Steve and others. 1998a. Webs of research: putting the user in control. IRISS '98: Institute for learning and research technology, University of Bristol. [http://sosig.ac.uk/iriss/papers/paper42.htm].

Hitchcock, Steve and others. 1998b. Linking electronic journals: lessons from the Open Journal project. D-Lib Magazine, no. December. [http://www.dlib.org/dlib/december98/12hitchcock.html].

Hunter, Karen. 1998. Sleepless nights redux. Against the Grain, no. February.

International DOI Foundation. DOI Foundation Member List. January 1999. [http://www.doi.org/idf-member-list.html].

Kierman, Robert. 1998. The next five years: a publisher's ambition. Serials 11, no. 2.

King, Donald W. and Nancy K. Roderer. 1978. The electronic alternative to communication through paper-based journals. The information age in perspective: Proceedings of the ASIS annual meeting, 1978 White Plains, NY: Knowledge Industry Publications for American Society for Information Science.

Kling, Rob and L. Covi. 1995. Electronic journals and legitimate media in the systems of scholarly communication. The Information Society 11, no. 4. [http://www.ics.uci.edu/~kling/klingej2.html].

Knudson, Frances L. and others. 1997. Creating electronic journal web pages from OPAC records. Issues in Science & Technology Librarianship 15, no. Summer. [http://www.library.ucsb.edu/istl/97-summer/article2.html].

Luce, Rick. 1998. Integrating the Digital Library Puzzle: The Library Without Walls at Los Alamos . International Summer School on the digital library 1997 Tilburg: TICER B.V. [http://lib-www.lanl.gov/lww/tilberg.htm ].

Luzi, Daniela. 1998. E-print archives: a new communication pattern for grey literature. Interlending and Document Supply 26, no. 3.

Lynch, Clifford A. 1997. Building the infrastructure of resource sharing: union catalogs, distributed search, and cross-database linkage. Library Trends 45, no. 3.

Pearl, A. 1989. Sun's link service: a protocol for open linking. Hypertext '89 Proceedings. New York: ACM.

Porter, Michael E. 1979. How competitive forces shape strategy. Harvard Business Review, no. March-April.

Scott, Marianne. 1998. Library-Publisher relations in the next millennium: the library perspective. IFLA Journal 22, no. 5/6.

Turner, Bonnie and Yale University Library. International Coalition of Library Consortia. March 1998. [http://www.library.yale.edu/consortia/].

Van de Sompel, Herbert. 1991. Heading towards an electronic library: location independent integration of electronic reference sources in library workstations. 10th Annual meeting of the Dobis/Libis User Group. Leuven: Dobis/Libis User Group Secretary.

Van de Sompel, Herbert. 1993. Optimalisatie van de konsultatieketen aan de Universiteit Gent. Bibliotheekkunde 51. Kris Clara and Julien Van Borm. Antwerpen: VVBAD.

Van de Sompel, Herbert. 1994. Technology and collaboration: creating an effective information environment in an academic context. Online Information 94. Proceedings of the 18th International Online Information Meeting. Oxford and New Jersey: Learned Information (Europe) Ltd.

Van de Sompel, Herbert. 1997a. Integrating CD-ROMs in the digital library. International Summer School on the digital library 1997. Tilburg: TICER B.V.

Van de Sompel, Herbert. 1997b. Tools for the digital library. From database networking to the digital library Padua.

Wang, Peiling and White, Marilyn Domas. 1999. A cognitive model of document use during a research project. Study II. Decisions at the reading and citing stages. Journal of the American Society for Information Science 50. no. 2.

Weislogel, Judy. 1998. Elsevier Science Digital Libraries Symposium. Serials Review 24, no. 2.

Acknowledgements

The authors wish to thank the following parties:

Herbert Van de Sompel wishes to thank:

Copyright © 1999 Herbert Van de Sompel and Patrick Hochstenbach

An additional acknowledgement was added at the request of one of the authors. April 18, 1999, the Editor.

Top | Contents
Search | Author Index | Title Index | Monthly Issues
Previous story | Next story
Home | E-mail the Editor

D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/april99-van_de_sompel-pt1