Search   |   Back Issues   |   Author Index   |   Title Index   |   Contents

Articles

spacer

D-Lib Magazine
May/June 2007

Volume 13 Number 5/6

ISSN 1082-9873

Creating the Next Generation of Archival Finding Aids

 

Elizabeth Yakel
University of Michigan School of Information
<yakel@umich.edu>

Seth Shaw
University of Michigan School of Information
<seth.e.shaw@gmail.com>

Polly Reynolds
University of Michigan Bentley Historical Library
<pbenes@umich.edu>

Red Line

spacer

1. Introduction

Since the advent of the Internet, access to information about primary sources has improved. Archivists have been early adopters of Internet technologies, first mounting archival inventories on gophers, later employing HTML and most recently using XML, specifically Encoded Archival Description (EAD), to display finding aids online. Despite the transition from paper to electronic form, online finding aids retain much of the look and functionality of their paper counterparts and make only minimal use of available technologies, usually for browsing and searching. Document genres need to evolve in response to changing technological environments and social cultures.

New online collaborative technologies, such as filtering and recommender systems, allow for new methods of interacting with and experiencing primary sources. Using diverse metadata sources and drawing inspiration from social technologies used in websites such as Amazon and Wikipedia, the Next Generation Finding Aids research group at the University of Michigan has developed an archival access system that combines existing archival practice (EAD) with "Web 2.0" features, namely involving user input through social software and collaborative filtering. This article describes a pilot project to reenvision the display and functionality of archival inventories using the "Polar Bear Expedition Digital Collections" as a test collection [1].

2. The Next Generation Finding Aids Research Group

In 2005, faculty and students from the University of Michigan School of Information (SI) started a research group to investigate "Next Generation Finding Aids" with the goal of reimagining traditional finding aid structure and functionality. The idea was to move beyond simply searching and browsing online finding aids and experiment with shared authority and collaboration, as well as collaborative filtering and social navigation mechanisms. Furthermore, although many repositories employ Encoded Archival Description (EAD) to display their online finding aids, none had used the full power of an XML-based system. The Next Generation Finding Aids research group's goals for the pilot project were to exploit the capabilities of EAD, create a collaborative and participatory archival and research experience, and fully utilize the electronic environment to display and connect users with archival content. Planning began in January 2005 and the "Polar Bear Expedition Digital Collections" site went live in January 2006.

3. Identifying the Content: Back ground on the Polar Bear Expedition Digital Collections

The Next Generation Finding Aids Research Group identified several potential finding aid instances for its first project. In the end, we selected the "Polar Bear Expedition" materials, a group of over 60 interrelated collections held by the Bentley Historical Library at the University of Michigan, as our experimental data set. The "Polar Bear Expedition," formally known as "American Intervention in Northern Russia, 1918-1919," describes an incident during World War I when U.S. military troops were sent to northern Russia to fight the Bolsheviks. This unique event is closely tied to Michigan history, since many of the soldiers serving in this campaign originated from Michigan. The Bentley Historical Library, an archive collecting materials on Michigan and University of Michigan history, has amassed one of the largest and most comprehensive archival collections on this topic, consisting of diaries, correspondence, photographs, maps, architectural drawings, postcards, scrapbooks, and published materials, as well as an oral history and a motion picture. In 2004, the Bentley Historical Library had the Polar Bear Expedition collections digitized to increase access and to preserve the originals, which were in fragile condition.

The Polar Bear Expedition materials proved to be an excellent set of collections for us to use to test our ideas. First, the Polar Bear Expedition collections have an established audience. Researchers approach these materials with a broad range of historical and genealogical questions. Therefore, an interested and devoted community already existed and would most likely use and benefit from online collaboration and participation. Second, the Polar Bear Expedition materials were already digitized and therefore provided a good test bed for experimenting with access and description of digital primary sources, a new area of research in the archival field. Third, the entire collections had been digitized, providing a unique opportunity to experiment with fully digital collections. Finally, the Polar Bear Expedition collections have always been considered one unit since they are highly interrelated, self-contextualizing, and provide different perspectives on a variety of events. As a result, the collections gave us an excellent opportunity to experiment with uniting and interrelating physically separate collections intellectually without affecting provenance and original order, which are central theories of archival practice.

4. Technological Infrastructure

The underlying infrastructure for the Polar Bear Expedition Digital Collections is a dual 2.3 GHz G5 XServe with 400 gigabyte mirrored hard drive running Mac OS X Server 10.4.7 (Tiger). The applications base for the Polar Bear system is Everything2 [2], a web content management system that allows for simplified site navigation, social interaction tools, and management. MySQL serves as the persistent data store for the content management system. This system was selected in January of 2005 when the Polar Bear Project began. Since that time, many other web content management tools have emerged. Now, the number of changes and interface technologies and tools available to users and implementers of these systems is quite large, including Asynchronous JavaScript and XML technologies that were still in their infancy when we began this project.

5. Metadata Reuse

While metadata is a crucial component of any digital project, its creation can be labor intensive and time-consuming. We were able to reuse three complementary sets of existing metadata: EAD finding aids, MARC records, and a database of information about soldiers who served in the Polar Bear Expedition. Understanding the difficulties of metadata reuse has been one of the major findings of the project.

One of our first decisions was whether to utilize the EAD encoded finding aids as the basis of the project or to develop an alternate structure for delivering archival content online. Although we recognized that EAD had limitations, we opted to adhere to the EAD standard as it is widely accepted and implemented in the archival community. Furthermore, using EAD could potentially allow for future ingest of other collections into our system without extra effort.

From the beginning, we realized the difficulties of dealing with EAD's flexibility: namely its ability to accept a multitude of structures and data inputs. Although EAD's lack of rigidity has allowed it to become a widely accepted archival standard, we found that this flexibility hindered the merging of multiple collections. We discovered that standardizing terms using controlled vocabulary and authorities was necessary to do the data processing we required. Even already controlled terms, such as subject headings, were found to be inconsistent. EAD's flexibility in constructing component levels also proved to be a problem when bringing together multiple collections. Since the digital images were created with minimal metadata, we developed a database based on the EAD to match descriptions to collections. We then used scripts to associate the description with the image during processing. The EAD issues we encountered were the most time consuming and vexing for the project. For example, we addressed the lack of EAD normalization with a series of Perl scripts to make global changes and by hand coding parts of the EAD inventories.

Balancing human intervention and optimization of machine processing became a key managerial decision. Had we been aware of the immensity of the normalization problem in EAD, we would have addressed it with existing tools, such as integrated development environments or the newly released Archivists' Toolkit [3], both of which would have generated cleaner, standardized data. While EAD is a huge step forward in standardizing archival description, much more development of it is needed before it can easily be processed by computer systems.

We also reused metadata from existing MARC records for each of the Polar Bear collections. Specifically, we wanted to reuse the subject headings for linking together multiple collections. Current finding aid systems often list static subject headings without providing any type of linking mechanism, a solution that is easily implemented and useful for indicating relationships between different collections. We encountered several issues in reusing the MARC record subject headings. First, the subject headings were originally devised for use in the main online bibliographic catalog MIRLYN at the University of Michigan. As a result, general subject headings were applied that would relate these collections within a more global context. For example, all of the collections contained the subject headings "World War, 1914-1918" and "Polar Bear Expedition." In the small subset of collections used in our project these subject headings were not specific enough to be useful, as all collections would be classified under these subjects. Other subject terms were more robust for reuse, but we found that it was necessary to create more consistency between headings. So, merging genre terms 'letters' from 'correspondence' and reordering subfields in MARC subject headings was necessary to tie like materials together.

To make the subject headings more meaningful, we enhanced the EAD finding aid, marking up additional subjects and concepts in the scope and content notes as well as in the biographical sections. As a result, researchers can now easily access collections through subjects, geographic locations, genres, individual names, and military units, allowing them to discover new interrelations and links between these collections. The creation of this rich browsing structure has been quite successful. The transaction logs from the first six months of usage show a preference among visitors for the browse function, with a specific interest in the individual names and geographic browse features. This finding adds evidence to recent research about the appropriateness of browse for some websites (Olston and Chi 2003) and user preferences for browse (Katz and Byrne 2003).

The final piece of existing metadata was contained in a Filemaker database compiled by archivists at the Bentley Historical Library listing more than 6,100 soldiers in the campaign with their birth and death dates, military unit, rank, and hometowns. We chose to fully integrate this database into the Polar Bear Expedition Digital Collections site with the primary and secondary sources in order to create a single site offering all information held at the Bentley. This addition of a truly research and analysis piece created by archivists based on the primary and secondary sources can be seen as debatable, particularly since the evidence and information on some of the soldiers is not comprehensive; however, this resource has proved to be a popular part of the site. The soldiers' database has also become a basic architectural piece for the site as it links metadata from disparate sources.

6. Incorporating Social Navigation Features in Archival Access Systems

We wanted to experiment with the use of social software tools, but were initially concerned with the applicability of such tools to archival and manuscript collections: What is the place of shared authority in archival description? Who will be the authoritative voice and how will accuracy be confirmed? How will researchers utilize these tools? Our interest in social software is based on the research (Tibbo, 2003; Anderson, 2004) demonstrating the continued reliance on peers and citations as the primary means of identifying collections. We were also aware of calls within the archival community to share authority for description (Duff and Harris 2002), be more transparent in the descriptive process (Light and Hyry, 2002), and utilize the interactive nature of the web more fully (Yakel, 2003). Social software leverages these assets and attempts to engage the researcher and harness researchers' knowledge about collections.

Despite the many advantages of technology, we have found that many researchers prefer using paper finding aids. We therefore took time at the beginning of the project to examine the behaviors and functionality paper finding aids receive that are not typically supported by online EAD systems. We decided one of the more interesting features of paper finding aids was signs of use. For example, oft-used paper finding aids might have dog-eared pages or annotations in the margins by both researchers and archivists. In response, we thought about the electronic equivalents of these physical manifestations and chose to incorporate social navigation mechanisms that supported this functionality both explicitly, permitting visitors to contribute comments, and implicitly, capturing their movements within the system.

Social navigation occurs in online situations where one visitor is aware of other visitors or when multiple visitors' paths over time can be used to guide and structure the activities of future users within that space (Dourish and Chalmers 1994). We had many types of mechanisms from which to choose that foster social navigation, and we selected four primary tools: a) commenting, b) collaborative filtering, c) bookmarking, and d) visitor awareness. Our goal was to create a system that fit the culture of the archival setting and that would preserve the "authoritative" voice of the archivist while allowing other voices to be heard.

a. Commenting

We initially considered implementing a wiki-based system, allowing users to directly edit collection and item descriptions. In the end, we chose a discussion-oriented commenting system that would allow users to contribute and interact but would also keep the archival voice intact. The Polar Bear site allows registered users to comment on collections and individual items as well as search others' comments. Interestingly, some of the comments have been done in the context of an optional biographical statement that visitors can provide during registration. Researchers have used the biographical statement to share their interest in this topic, indicate a familial connection to one of the soldiers, ask questions, provide new or correct information about soldiers, and create links to their own Polar Bear Expedition websites.

Visitors have utilized the commenting feature to address the "The Archivist" persona. We assess the visitor's question and either respond or forward the inquiry to the appropriate archivist at the Bentley for a response. At any point, visitors to the site can contact and asynchronously dialog with the archivist. Numerous visitors have already identified errors and omissions in the information on the Polar Bear site. As a result, we have had to develop a procedure for making changes to the site as a result of user submissions. In general, before making changes we ask for some authoritative source of information, such as a death certificate or discharge papers. Receiving updates and changes from users was not unexpected, and we had discussed beforehand how we would handle such requests. We were, however, unprepared for the other major type of query: donations of materials that visitors wanted to add to the site. Since we do not view ourselves as a collecting repository, we have forwarded these offers to the Bentley Historical Library, resulting in several new additions both to their collections and to our site. While we have found most of the dialog to be between "The Archivist" and site visitors, some interaction between the researchers themselves has occurred. In one exchange, a visitor suggested additional websites and other historical resources to another researcher who was searching for information on her ancestor. We hope to encourage and see more user-to-user interaction in the future.

b. Collaborative Filtering

Collaborative filtering is a means of generating automatic predictions (filtering) about the interests a user might have by collecting usage information from all site visitors (collaborating) (Goldberg et al. 1992). Generally, the more people use the site, the better the filtering will be. Amazon.com uses such a mechanism, offering book suggestions based on purchase and browsing patterns of other customers. For the Polar Bear Expedition Digital Collections, we adapted a collaborative filtering mechanism known as 'soft links' used on the Everything2 site and refer to these as 'link paths' on our site.

Link paths track a user's particular path within the system. Once a certain number of users have followed the same path, the system will begin to display on the bottom of the path's originating page a link to the path's destination page, ranked among the other link paths by use frequency. These paths function as a type of automatic recommender system, relaying immediate feedback to researchers on how other others reached a particular item or collection. The link paths feature also allows us to adapt the "signs of use" functionality found in paper finding aids.

To describe this feature on our site to our users, we initially just titled the section "Link Paths" with a link to a help page with more information about the feature. However, user surveys and interviews in the spring of 2006 revealed that visitors were unclear about what the link paths were. To connect our users to something they might already be familiar with, we adapted the Amazon.com tag line for our site: "Researchers who viewed this page also viewed:" Our intent is that the link paths will provide alternate and unexpected interrelations between subjects and collections that will enable researchers to make unanticipated connections between records.

c. Bookmarking

The Polar Bear Expedition site also allows registered visitors to bookmark biographical entries, collections, or items to enable quick retrieval in the future. Researchers can then manage the bookmarks by deleting those no longer desired. The bookmark feature facilitates visitors' reuse of the site's materials and, in essence, enables them to create their own mini-collections or archives for their own personal use. Our most prolific bookmarker has marked 19 total pages while most visitors have only marked a single page of interest. At the moment, these bookmarks are personal and only accessible to the registered user who created them.

d. Visitor Awareness

The final mechanism that we have employed is visitor awareness. Registrants can see which other researchers are on the site at any given time. This permits some asynchronous communication and allows visitors to see the mix of people interested in the site. Additionally, the site also lists new users (some with biographies) who have recently joined the site.

7. Conclusions

By using a variety of social navigation mechanisms we have tried to balance the public nature of the comments and the visitor awareness features of the Polar Bear Expedition Collections with the implicit collaborative filtering and private bookmarks. We are very cognizant of the social axis and user privacy as noted by Hammond et al. (2005). While users are aware that their biographies and comments are publicly viewable, link path statistics and bookmarks have been kept anonymous. We have created a virtual space that can be utilized for scholarly and personal reasons but do recognize that such a space has the potential to be abused. As a result, we do not require registration to use the site.

Our design decisions have attempted to balance the need for continued archival authority with a desire to incorporate some of the social aspects of Web 2.0 features. The relative merits of each of these solutions is open for debate. Social navigation, collaborative filtering, and shared authority among archival users and archivists will continue to be controversial topics; however, we intend to be part of this dialog. For the Next Generation Finding Aids research group, the Polar Bear Expedition Collections site is just the first in a series of experiments exploring how to make archival information more accessible in the virtual environment.

Acknowledgements

We wish to thank the staff of the Bentley Historical Library for their willingness to let us experiment with the Polar Bear Expedition Digital Collections. We also thank the following former members of our research team: Dharma Akmon, Andrew Bangert, Magia Krause, Ricah Marquez and Christie Peterson, and James Sweeney, our lead programmer.

Notes

[1] Polar Bear Expedition Digital Collections, <http://polarbears.si.umich.edu>.

[2] Everything2, <http://www.everything2.com>.

[3] Archivists' Toolkit, <http://www.archiviststoolkit.org>.

References

Anderson, I. (2004). Are you being Served? Historians and the Search for Primary Sources, Archivaria 58: 81-129.

Dourish, P. & Chalmers, M. (1994). Running out of Space: Models of Information Navigation. Proceedings of the Conference Human Computer Interaction '94.

Duff W.M. & Harris, V. (2002). Stories and Names: Archival Description as Narrating Records and Constructing Meanings, Archival Science 2/3-4: 263-285.

Goldberg, D., Nichols, D., Oki, B.M. & Terry, D. (1992). Using collaborative filtering to weave an information Tapestry. Communications of the ACM 35/12: 61-70.

Hammond, T., Hannay, T., Lund, B. & Scott, J. (2005) Social Bookmarking Tools (1). D-Lib Magazine April. URL: <doi:10.1045/april2005-hammond>.

Katz, M.A. & Byrne, M.D. (2003) Effects of scent and breadth on use of site-specific search on e-commerce Web sites. ACM Transactions on Computer-Human Interaction (TOCHI) 10/3: 198-220

Light, M. & Hyry, T. (2002). Colophons and Annotations: New Directions for the Finding Aid, American Archivist 65(2): 216-230.

Olston, C. & Chi, E. (2003). ScentTrails: Integrating browsing and searching on the Web. ACM Transactions on Computer-Human Interaction (TOCHI) 10/3: 177-197

Tibbo, H.R. (2003). Primarily History in America: how U.S. historians search for primary materials at the dawn of the digital age. American Archivist 66(1): 9-50.

Yakel E. (2003) Impact of Internet-based Discovery Tools on Use and Users of Archives, Proceedings of the XXXVI Roundtable on Archives (CITRA) Meeting, November 11-14, 2002, Marseilles, France. Comma: International Journal on Archives 2003(2-3).

Copyright © 2007 Elizabeth Yakel, Seth Shaw, and Polly Reynolds
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Back Issues
Previous Article | Next Article
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

doi:10.1045/may2007-yakel