Articles
spacer

D-Lib Magazine
October 2001

Volume 7 Number 10

ISSN 1082-9873

A Call to Researchers

Digital Libraries Need Collaboration Across Disciplines

 

Kevin W. Boyack, Brian N. Wylie, and George S. Davidson
Sandia National Laboratories *
Albuquerque, New Mexico 97185, USA
[email protected]

* Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,
for the United States Department of Energy under Contract DE-AC04-94AL85000
.

Red Line

spacer

Abstract

Digital libraries stand to benefit from technology contributions from the fields of information visualization, human-computer interaction, and cognitive psychology, among others. However, the current state of interaction between these fields is not well understood. We have used our knowledge visualization tool, VxInsight®, to provide several domain visualizations (science maps) of the overlap between these fields. Relevant articles were extracted from the Science Citation Indexes (SCI and Social SCI) using keyword searches. An article map, a semantic (co-term) map, and a co-author network have been generated from the data. Analysis reveals that while there are overlaps between fields, they are not substantial. However, the most recent work suggests areas where future collaboration could have a great impact on digital libraries of the future.

1. Introduction

The amount of information becoming available in digital form is increasing exponentially. Many institutions, while interested in providing digital information to their users, are only slowly making the shift from paper to digital libraries. There is a pronounced need for advances in techniques and tools to aid the individual user in finding and gleaning knowledge from relevant information.

Many researchers feel that such advances would be enhanced by technology insertion from the fields of information visualization, human-computer interaction, and cognitive psychology. Yet, to date, it is unclear how much interaction there has been between these fields, and how much impact each has had on advances in digital libraries.

The purpose of this study is to explore the history and current state of overlap between these four fields (including digital libraries) by analysis of bibliographic information. We have used our visualization tool, VxInsight® [1], which was developed to build and explore maps of technology using data from the Science Citation Index (SCI). Over the past few years, we have found that VxInsight has broad application to mapping and navigation of many different types of data [2, 3]. In this article, we provide a short overview of related work and tools, some background on the VxInsight tool and process, and several different visualizations of the domain comprised of information visualization, human-computer interaction, cognitive psychology, and digital libraries. We include suggested areas for future collaboration and a call to researchers to collaborate across fields for the benefit of the digital libraries of the future.

2. Related Work

2.1 Literature maps

Various efforts to map the structure of science from literature have been undertaken for many years. The majority of these studies have been performed at the discipline or specialty level. Maps are often based on similarity between journal articles using citation analysis [4], co-occurrence or co-classification using keywords, topics, or classification schemes [5, 6], or journal citation patterns [7, 8]. Latent semantic analysis (LSA) has been used to map papers based on co-occurrence of words (or authors) in titles, abstracts, or full text sources [9, 10]. In addition, domain maps based on author co-citation analysis [11] are becoming more common.

Many of these studies probe the dynamic nature of science, and the implications of changes. At Sandia National Laboratories, our primary use of such maps has been for competitive intelligence; i.e., to see who else is doing what in our intended fields of research, to identify potential collaborations, and the like. However, this study was not done for competitive intelligence reasons. Rather, one of the authors was on the program committee for the first Visual Interfaces to Digital Libraries workshop <http://www.dlib.org/dlib/july01/07inbrief.html>, and it seemed timely to explore the overlap between the fields that were targeted by that workshop.

2.2 Visualization tools

Traditionally, the standard output for literature mapping studies has been a circle plot where each cluster was represented by an appropriately sized circle. Links between circles provide relationship information. Until recently these mapping studies have been paper-based and thus only resolve structure at a few discrete levels. In recent years, several systems have been reported that use a computer display and allow some navigation, browsing, and filtering of the map space.

SENTINEL [12] is a Harris Corporation package that combines a retrieval engine using n-grams and context vectors for effective query with the VisualEyes visualization system. The visualization tool allows the user to interact with document clusters in a three-dimensional space. Chen [9, 13] uses a VRML 2.0 viewer to display authors (as spheres) and the Pathfinder linkage network based on author co-citation analysis. Citation rates are shown as the 3rd dimension in these VRML maps. B�rner [10] uses the CAVE environment at Indiana University to interface with small numbers of documents in a virtual library. Features such as shape, color, and labeling are used to identify features of each document. Document details are available on demand through a hypertext link. Self-organizing maps have been used in many venues, including the organization of document spaces [14]. This technique is used to position documents, and then display them in a two-dimensional, contour-map-like display in which color represents density.

Two packages more similar to VxInsight are SCI-Map developed by ISI [15], and the SPIRE suite of tools that originated at Pacific Northwest National Laboratory [16, 17]. SCI-Map uses a hierarchically-nested set of maps to display the document space at varying levels of detail. This nesting of maps allows drilling down to subsequent levels. Each map is similar to the traditional circle plot, where the size of the circle can indicate the density of documents contained in the circle, or some measure of importance. Relationships at each discrete level are indicated by links between circles.

Like VxInsight, SPIRE maps objects to a two-dimensional plane so that related objects are near each other, and provides tools to interact with the data. SPIRE has two visualization approaches. In the Galaxies view, documents are displayed as a scatter plot. This interface allows drilling down to smaller sections of the scatter plot, and provides some summarization tools. In the Themescape view, a high-level terrain display, similar to that in VxInsight, is used. Themescape visualizes specific themes as mountains and valleys, where the height of a mountain represents the strength of the theme in the document set.

None of the systems mentioned above is interactive in the sense that it would interface to a large digital library in real-time. One system showing promise for the future is a new query-based visual interface at Drexel University [18]. Buzydlowski and coworkers have developed an interface to over 1.2 million records from the Arts & Humanities Citation Index (AHCI). The user types in the name of an author of interest, and a map of the 25 authors most linked to the query author is returned. The user can drill down through an author to find individual works.

3. VxInsight Description

VxInsight® is a powerful and flexible PC-based tool for exploring data collections. It works by providing access to information, such as library articles, in an intuitive visual format that is easy to interpret and that aids natural navigation. VxInsight exploits the human capability to visually detect patterns, trends, and relationships by presenting the data as a landscape, a familiar representation that we are adept at interpreting, and which allows very large data sets to be represented in a memorable way. Maps of science can be created using mathematical tools, where each article is given a position on the (x,y) plane, and then viewed in VxInsight. The process for generating a map is described elsewhere [1-3].

In VxInsight, the (x,y) coordinates for each article are used to generate the mountain terrains. The height of each mountain is proportional to the number of objects beneath it. Labels for peaks are generated dynamically from any attribute in the database (such as the article titles) by showing the two most common words, phrases, numbers, etc. in a peak for that attribute. This reveals the content of the objects that comprise each peak, and provides context for further navigation and query.

VxInsight supports multi-resolution zooming into the landscape to explore interesting regions in greater detail, which reveals structure on multiple scales. Following each mouse click, the landscape and labels are recalculated, to give a new, higher resolution view of the desired terrain. Temporal data can be viewed using a time slider to reveal growth and reduction in areas of interest, new emerging areas, and bridged regions that have merged.

A query window allows the user to interrogate the map, resulting in colored markers on the terrain showing those items matching the query. The distribution of query markers in the context of the terrain with its labels can be very meaningful to the user. Various analysis tasks can be accomplished by combining navigation, multiple queries, and time sliding functions.

Domain Visualizations

Several different visualizations were prepared to show the domain comprised by the fields of information visualization (IV), human-computer interaction (HCI), cognitive psychology (CP), and digital libraries (DL). One of the advantages of domain visualization is the ability to combine and explore related work from different fields.

The first step in this process was to procure an appropriate set of bibliographic data. Data were retrieved from the Science Citation Index and Social Science Citation Index. A short list of search terms related to the four fields was compiled and was queried against titles, abstracts, and keywords for years 1991-present. The list of search terms is shown in the following table.

Search Term Field
information visualization IV
information exploration IV
information navigation IV
information browsing IV
human-computer interface HCI
human-computer interaction HCI
cognitive model CP
cognitive science CP
cognitive psychology CP
cognitive system CP
digital library(ies) DL
mental model  

A total of 4478 unique articles were retrieved, with approximately 700, 800, 2000, and 370 articles in the IV, HCI, CP, and DL fields, respectively. The term 'mental model' was included in the original search, but was found to have little overlap with fields other than cognitive psychology. Thus, it will not be discussed further.

Three different domain maps based on these data were produced: an article map, a semantic (keyword) map, and an author map. Each will be described further below.

4.1 Article map

A map of articles was generated based on the number of ISI keywords in common between each pair of articles. 1336 articles were not given positions on the map since they had no keywords in common with any other article. Thus the map contained 3142 articles. Of note, 60% of the DL articles had no keywords, suggesting that additional text-based analyses (e.g. latent semantic analysis) are needed to fully understand the DL overlaps. Figure 1 shows the domain map for three separate 2-year time periods.

 

Figure showing the IV, HCI, CP, and CL domain for three time periods

Figure 1. IV/HCI/CP/CL domain for three time periods.

 

spacer

An overview of Figure 1 reveals that although there are some overlaps in the four fields, they are not extensive. Peaks in the lower right portion of the terrain are dominated by CP (magenta dots), with few IV and HCI papers in some areas. IV work (green dots) is found near the top center of the terrain, and does show significant overlap with DL (white dots) in one peak. Most of the HCI work (blue dots) is in the peaks at the far left and lower left. These peaks show perhaps more overlap between the four fields than any others. Detailed views of these two HCI peaks are shown in Figure 2, which covers the entire time period from 1991-2000. The far left peak (Figure 2a) contains mostly HCI material with a scattering of IV and CP at the edges. The peak at the lower left (Figure 2b) is actually a ridge of two clusters with bridging material between them, where the left-most peak has more HCI material and the right-most peak has more CP material. Close examination of this structure indicates that the left-most cluster is concerned with interfaces and design, while the right-most cluster is more concerned with systems and cognition. The work in the center deals with relevance of HCI design.

 

Figure showing the IV, HCI, CP, and CL domain for three time periods

Figure 2. Detail on HCI peaks from Figure 1.

 

spacer

Figure 1 also shows trends in publishing in the four fields. DL work appears in the 1995-96 time frame, and grows through the 1999-2000 time frame. In addition, some DL work has moved from the core DL work in the center of the terrain to the HCI peaks by year 2000. This indicates that DL may be receiving some benefit or insertion from the HCI/CP work that forms the HCI clusters. This observation is tempered somewhat by the lack of many direct query overlaps. In fact, only 8 papers retrieved with the DL query were also retrieved using any of the other queries. Thus, while DL work may be benefiting indirectly from the HCI/CP work, very little work connects them directly.

Additional trends from Figure 1 include a slight shift in HCI work (lower left peaks) from interface design to a system design with more cognitive modeling input. This is indicated by the shift in peak sizes and query marker distributions in Figure 1.

Perhaps the most exciting overlap from a DL standpoint occurs in the DL peak near the top center of Figure 1, which shows a significant overlap with IV. A detailed view of this region is shown in Figure 3. Here, retrieval design and database retrieval are sandwiched between DL and IV clusters. Several HCI and CP papers are also found in this region of convergence between IV and DL. This topic of retrieval is thus at the overlap of all four fields, and is an area in which DL can benefit from collaborations across the other three disciplines.

 

Figure showing the IV, HCI, CP, and CL domain for three time periods

Figure 3. Detail on the DL / IV peak from Figure 1.

 

spacer

This conclusion that future work based on convergence between the IV, HCI, CP, and DL fields should focus on information retrieval may appear elementary. However, this analysis provides a formal basis to that conclusion, and may also suggest specific work that can be built upon for such studies. A list of the articles that appear in Figure 3 is too large for this paper, but may be obtained from the author.

4.2 Semantic map

In addition to the article map described above, a semantic map (co-term) representing the domain was also created. This map was created using ISI keywords as terms. No words parsed from titles or abstracts were used as terms. This analysis was restricted to terms occurring at least twice in the corpus of documents, comprising 2373 terms in all. Figure 4 shows those terms in the semantic map that occur 30 times or more in the keyword lists of the document corpus. The center map shows the spatial relationship between the clusters of terms, while the other three segments show individual terms and their spatial relationships.

 

Figure showing the IV, HCI, CP, and CL domain for three time periods

Figure 4. Semantic map for the IV/HCI/CP/DL domain.
(For a larger view of Figure 4, click here.)

 

spacer

The cluster at the upper left contains terms related to three of our four fields: IV, HCI, and DL. The term information retrieval is also there, which indicates its prominence, and which reinforces the conclusion reached above that it should be the focus of future work.

The cluster at the upper right is concerned with the medical side of cognitive psychology, while the large cluster at the bottom of the terrain focuses on cognitive processes (e.g., comprehension, perception, recognition). The left-most part of the lower cluster also shows two computer-related terms (artificial intelligence, neural networks) that have their roots in cognitive processes. This suggests that the cognitive processes of the lower cluster and the IV/HCI/DL-related fields of the upper left cluster can be bridged by modeling of cognitive processes in the computer realm.

This is already happening to some extent (see Figure 5). Terms such as automatic analysis and graph algorithms have recently appeared in the space between the IV/HCI/DL terms and cognitive processes. Significant and continuing advances in these bridging areas are essential to the growth and useability of digital libraries. We call upon researchers in the traditional cognitive fields, human-computer interaction, computer visualization, algorithms, digital libraries, and any other interested parties to break disciplinary bounds and work together. Teaming across these fields has the potential to provide breakthrough technologies to enable the digital libraries of the future.

 

Figure showing the IV, HCI, CP, and CL domain for three time periods

Figure 5. Changes in terms over time in the space between the IV/HCI/DL fields and cognitive processes.

 

spacer

4.3 Co-author network

The need for collaboration across fields will require the formation of many new relationships. That this is the case is shown in Figure 6 by our third domain map, a co-author network, based on authors with two or more papers in the document corpus.

The co-author network of 885 authors is represented in Figure 6. Authors whose papers were retrieved by queries to the four fields are shown as dots of different colors. There are very few instances where dots of more than one color occur in the same local cluster. In addition, there are many, many clusters in this map, but only one arrow joins more than one cluster. This indicates that there is currently no co-author network established across the four fields. Rather, the majority of researchers have interactions with a small group of others, doing research in one of the four fields. For a convergence in the IV, HCI, CP, and DL fields to truly occur, much more collaboration across fields is needed.

 

Figure showing the IV, HCI, CP, and CL domain for three time periods

Figure 6. Co-author network. Arrows show connections.

 

spacer

Summary

We have produced three visualizations of the domain comprised of the fields of information visualization, human-computer interaction, cognitive psychology, and digital libraries. Analysis based on dynamic views of the maps indicates that there is little current overlap between these fields or collaboration between their researchers. However, the analyses also indicate that there are areas where recent research has occurred, and where future research should be focused to greatly benefit the digital libraries of the future. These areas include information retrieval and algorithms based on modeling of human cognitive processes. We repeat our call to researchers in the traditional cognitive fields, human-computer interaction, computer visualization, algorithms, digital libraries, and other fields to form new collaborations. Such collaborations will not only benefit digital libraries, but will enhance other areas of focus between pairs of disciplines.

References

[1] Davidson, G.S., Hendrickson, B., Johnson, D.K., Meyers, C.E. & Wylie, B.N. (1998). Knowledge mining with VxInsight: discovery through interaction. Journal of Intelligent Information Systems 11, 259-285.

[2] Boyack, K.W., Wylie, B.N., Davidson, G.S. & Johnson, D.K. (2000). Analysis of patent databases using VxInsight. ACM New Paradigms in Information Visualization and Manipulation '00, McLean, VA, Nov. 10, 2000.

[3] Davidson, G.S., Wylie, B.N. & Boyack, K.W. (2001). Cluster stability and the use of noise in interpretation of clustering. Proc. IEEE Information Visualization 2001, 23-30.

[4] Small, H. (1997). Update on science mapping: creating large document spaces. Scientometrics 38, 275-293.

[5] Noyons, E.C.M. & Van Raan, A.F.J. (1998). Advanced mapping of science and technology. Scientometrics 41, 61-67.

[6] Spasser, M.A. (1997). Mapping the terrain of pharmacy: co-classification analysis of the International Pharmaceutical Abstracts database. Scientometrics 39, 77-97.

[7]Leydesdorff, L. (1994). The generation of aggregated journal-journal citation maps on the basis of the CD-ROM version of the Science Citation Index. Scientometrics 31, 59-84.

[8]Bassecoulard, E. & Zitt, M. (1999). Indicators in a research institute: A multi-level classification of scientific journals. Scientometrics, 44, 323-245.

[9] Chen, C. (1999). Visualising semantic spaces and author co-citation networks in digital libraries. Information Processing and Management 35, 401-420.

[10] Börner, K. (2000). Extracting and visualizing semantic structures in retrieval results for browsing. ACM Digital Libraries '00, San Antonio, TX, June 2000.

[11] White, H. D. & McCain, K. W. (1998). Visualizing a discipline: An author co-citation analysis of information science, 1972-1995. Journal of the American Society for Information Science 49(4), 327-355.

[12] Fox, K.L., Frieder, O., Knepper, M.M. & Snowberg, E.J. (1999). SENTINEL: A multiple engine information retrieval and visualization system. Journal of the American Society for Information Science 50(7), 616-625.

[13] Chen, C., Paul, R.J. & O'Keefe, B. (2001). Fitting the jigsaw of citation: Information visualization in domain analysis. Journal of the American Society for Information Science and Technology 52(4), 315-330.

[14] Honkela, T., Kaski, S., Kohonen, T. & Lagus, K. (1998). Self-organizing maps of very large document collections: Justification for the WEBSOM method. In I. Balderjahn, R. Mathar & M. Schader (Eds.) Classification, Data Analysis, and Data Highways. Berlin: Springer.

[15] Small, H. (1999). Visualizing science by citation mapping. Journal of the American Society for Information Science 50(9), 799-813.

[16] Hetzler, B., Whitney, P., Martucci, L., & Thomas, J. (1998). Multi-faceted insight through interoperable visual information analysis paradigms. Proc. IEEE Information Visualization 1998, 137-144.

[17] Wise, J.A. (1999). The ecological approach to text visualization. Journal of the American Society for Information Science 50(13), 1224-1233.

[18] Buzydlowski, J. W., White, H. D. & Lin X. (2001). Term co-occurrence analysis as an interface for digital libraries. Visual Interfaces to Digital Libraries workshop at 1st ACM+IEEE Joint Conference on Digital Libraries, Roanoke, VA, June 28, 2001.

Copyright 2001 Kevin W. Boyack, Brian N. Wylie, and George S. Davidson
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Back Issues
Previous Article | Next Article
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/october2001-boyack