Volume 10 Number 5
Georeferencing in Digital Libraries
Linda L. Hill
Alexandria Digital Library Project
University of California, Santa Barbara
Georeferencing is relating information (e.g., documents, datasets, maps, images, biographical information, artifacts, specimens) to geographic locations through placenames (i.e., toponyms) and place codes (e.g., postal codes) or through geospatial referencing (e.g., longitude and latitude coordinates). The digital library perspective toward georeferencing is a blend of the focus of geographic information systems (GIS) on geospatial coordinates, data layers, and mapping; of map librarianship on the acquisition, cataloging, and use of cartographic publications and data; and of the traditional library focus on textual representation of location using placenames, administrative unit hierarchies, and other textual forms of spatial reference. This special issue of D-Lib Magazine is not about GIS and it is not about cartography or map libraries. Its focus is beyond traditional library practices. This issue is about the application of georeferencing to all types of information and the integration of geospatial description, searching, and analysis into digital library practices. More broadly, it is about supporting spatial literacy, "meaning the ability to interpret problems and their solutions in spatial terms" (Marley, 2001) in digital library applications.
It is a powerful concept to think about designing library systems so that users can find all types of information about a location simply by identifying that location on a map or a georeferenced image, without supplying all of the relevant placenames or knowing the coordinates for the area. Interfaces with interactive maps provide not only a query interface but also an evaluation canvas where the spatial coverage of a set of objects can be visualized and the user can see how the different objects are spatially related. When the geospatial focus or context of textual statements can be determined by an analysis of place references and conversion of those references to coordinates, a vast store of information can be related to locations on the surface of the Earth and to the maps, photographs, remote sensing images, and datasets that are also about that place.
This issue of D-Lib is focused on digital library implementations that have incorporated geospatial referencing as a key component of collection- and item-level description, search access, and the visualization, analysis, and use of search results. In this issue, the following working definitions have been adopted:
- Georeferencing is the practice of representing and selecting information related to geographic locations by use of placenames (i.e., toponyms) and place codes (e.g., postal codes) or through geospatial referencing.
- Geospatial referencing is the practice of integrating geospatial description (e.g., longitude and latitude coordinates) into digital library systems and services in support of information seeking and user problem solving in geospatial terms (e.g., spatial containment, overlapping areas, nearness, direction and distance references)for all forms of information resources.
- Space is an aspect of objective physical location, which can be specified uniquely and analyzed using the universal scientific language of mathematics. A two-dimensional (2-D) footprint on the surface of the Earth is the basic representation of space in digital libraries.
- Place is a sociocultural expression designating a location, which is typically described in natural language as a placename.
Geospatial access to the information in digital libraries remains an underdeveloped capability today, too often perceived to be a capability exclusively associated with GIS or with special collections that hold geospatial objects such as maps and aerial photographs. This perception is rooted partly in past practices, but it is also based on the technological and intellectual challenges of integrating spatial representation and access into basic digital library practices and the extra dimensions of managing and using geospatial resources.
The goal of this issue of D-Lib is to foster greater awareness of the potential of geospatial referencing in information management implementations, using actual implementations for inspiration and insight. Through the articles included here, the path forward toward ubiquitous geospatial referencing in digital libraries may become clearer. Someday it may not be unusual for us to be able to search for information related to a place on the surface of the Earth by starting with a placename (our preferred method of referring to a geographic location in human discourse) and using the location of the place on a map (shown to us on the library interface) to retrieve relevant information from archives of oral and written histories, biological specimens, photographs, GIS data, and major library catalogs across the world. The longitude and latitude coordinates of geospatial location are the lingua franca that makes this possible.
Hand-in-hand with the geospatial dimensions of information are the temporal dimensions. Both are well served by visualization techniques showing the distribution of information resources in time and space. Key considerations for both dimensions are the development of tools to translate between the formal (coordinates and time representations) and informal (named places and time periods) methods of representation, understanding the issues of formal representation in continuous space/time, and issues of user services, computer architectures, and conceptual frameworks.
In the last ten years, there have been a number of workshops on georeferencing in digital libraries that have created a legacy of reports and websites that document evolving ideas and specific projects. Particularly worthy of consultation are the following:
- In April 1995, the Clinic on Library Applications of Data Processing, organized by the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign, was on the topic of Geographic Information Systems and Libraries: Patrons, Maps, and Spatial Information (Smith & Gluck, 1996). The thirteen papers in the proceedings volume include one by Ray R. Larson (Berkeley) on "Geographic Information Retrieval and Spatial Browsing" (pp. 69-123) and others on the early activities to meet "the needs of library users for spatial and cartographic data."
- In June 1998, a workshop on Distributed Geolibraries: Spatial Information Resources was organized by the Mapping Science Committee of the National Research Council. The resulting workshop report presents "a vision for distributed geolibraries, not a blueprint" and "a series of technical challenges as well as institutional and social issues, which are addressed relative to the vision" (National Research Council Mapping Science Committee, 1999).
- In October 1999, an NSF-sponsored workshop on Digital Gazetteer Information Exchange (DGIE) was held at the Smithsonian Institute (Hill & Goodchild, 2000). The workshop "was convened to develop an understanding of the potential of indirect spatial referencing of information resources through geographic names and to identify the research and policy issues associated with the development of digital gazetteer information exchange."
- In July 2002, a workshop sponsored by the Networked Knowledge Organization Systems/Services (NKOS) group was held at the Joint Conference on Digital Libraries (JCDL 2002) in Portland, Oregon on the topic of Digital Gazetteers: Integration into Distributed Digital Library Services (Networked Knowledge Organization Systems/Services Group, 2002). Clifford Lynch, executive director of the Coalition for Networked Information, commented in his summation that "simple visualization of documents geographically is a mere party trick; the real value of georeferenced information services lies in finding associated concepts and doing query expansion and assisting in cataloguing and metadata creation."
- In May 2003, a workshop on Analysis of Geographic References was held as part of the North American Chapter of the Association for Computational Linguistics and Human Language Technology Conference (NAACL-HLT 2003) in Edmonton, Alberta (Kornai & Sundheim, 2003). The primary focus of the workshop was "to discuss how existing natural language processing (NLP) techniques can be adapted and new ones developed that will advance core technology in geographic reference analysis."
This D-Lib issue includes six articles by authors who have been early advocates and developers of georeferencing applications in digital libraries and in natural history museums.
- Michael F. Goodchild, UC Santa Barbara, is well-known in the field of geographic information science and the co-PI on the Alexandria Digital Library (ADL) Phase I and II NSF-funded digital library projects. He describes the ideas that led to the research and development of a georeferenced digital library architecture and on to the operational ADL at UC Santa Barbara. He proposes a vision for future research and development based on unfilled early goals and new opportunities.
- Greg Janée, James Frew, and Linda L. Hill, also with the ADL Project at UC Santa Barbara, focus on seven technical issues that challenge the development of georeferenced digital libraries, related to spatially-based discovery and ranking of results, gazetteer integration, data typing beyond text, scalability given the amount of georeferenced data available, spatial visualization and labeling to provide context for user understanding, and resource access issues that arise with geospatial data.
- Gregory Crane, Tufts University, reflects on the conceptualization and recording of time and space in cultural memories and accounts, and on the insights gained through the Perseus Digital Library Project into issues of extracting geospatial data from unstructured textual sources.
- Michael Buckland and Lewis Lancaster, Electronic Cultural Atlas Initiative (ECAI) at UC Berkeley, draw on their experiences with developing space-time referencing methods and technologies for an international community of cultural history scholars, for accessing traditional library catalogs, and for the integrated searching of numeric and textual resources to describe how place and time are pivotal factors for interdisciplinary and multilingual inquiry.
- James S. Reid, Chris Higgins, David Medyckyj-Scott and Andrew Robson, the EDINA National Datacenter at Edinburgh University, use their experience with the development of the EDINA GeoData Services to propose "paths to convergence" for spatial data infrastructures and digital libraries. They introduce the concept of a community specific spatial data infrastructure (CSSDI), using the UK academic digital library community as an example, and argue that the concept of a CSSDI provides "a focus around which other digital libraries might consolidate their efforts to better exploit geographically referenced resources."
- Three authors in the field of natural history informaticsReed Beaman at the Peabody Museum, Yale; John Wieczorek with the MaNIS project at Berkeley; and Stan Blum with the California Academy of Sciences in San Franciscodescribe the challenges of geospatially referencing the specimens held by natural history museums, based on the narrative descriptions of collection locations, so that the vast resources of specimen data can be spatially analyzed and integrated into georeferenced digital library systems.
This all adds up to a selected view of the state-of-the-art of georeferencing related to digital library activities and marks another milestone along the way to the full integration of the geospatial aspects of information into library services.
Leslie Champeny is a student in the Masters of Library and Information Science (MLIS) program at UCLA, having previously earned graduate degrees in Anthropology and Linguistics from UC San Diego. She has assisted in getting this issue together and will end up knowing a whole lot more about georeferencing in digital libraries than most MLIS grads. She wanted some practical experience in electronic publishing which turned out to be a good match with this project. Thank you, Leslie, for all of your efforts.
Thanks are also due to Bonnie Wilson, editor of D-Lib Magazine, for making this special issue possible.
Hill, L. L., & Goodchild, M. F. (2000). Digital Gazetteer Information Exchange (DGIE). Final Report of the Workshop Held October 12-14, 1999. Available at <http://www.alexandria.ucsb.edu/gazetteer/dgie/DGIE_website/DGIE_homepage.htm> [2004, March 31].
Kornai, A., & Sundheim, B. (2003). Workshop on the Analysis of Geographic References, May 31, 2003, Edmonton, Alberta, as part of the North American Chapter of the Association for Computational Linguistics and Human Language Technology Conference (NAACL-HLT 2003). Available at <http://gunsight.metacarta.com/kornai/NAACL/WS9/> [2004, March 31].
Marley, C. (2001). The changing profile of the map user. In R. B. Parry & C. R. Perkins (Eds.), The Map Library in the New Millennium (pp. 12-27). Chicago: American Library Association.
National Research Council Mapping Science Committee. (1999). Distributed Geolibraries: Spatial Information Resources. Summary of a Workshop held June 15-16, 1998. Washington, DC: National Academy Press. Available at <http://www.nap.edu/html/geolibraries>.
Networked Knowledge Organization Systems/Services Group. (2002). Digital Gazetteers - Integration Into Distributed Digital Library Services. Available at <http://nkos.slis.kent.edu/DL02workshop.htm> [2004, March 31].
Smith, L. C., & Gluck, M., eds. (1996). Geographic Information Systems and Libraries: Patrons, Maps, and Spatial Information: Papers presented at the 1995 Clinic on Library Applications of Data Processing, April 10-12, 1995. [Urbana]: Graduate School of Library and Information Science University of Illinois at Urbana-Champaign.
(On May 28, 2004, in the body of the text of this article, the phrase "EDINA project" has been replaced by the wording "EDINA National Datacenter".)
Copyright © 2004 Linda L. Hill
Editorial | Letters | Contents | First Article
Home | E-mail the Editor