Search   |   Back Issues   |   Author Index   |   Title Index   |   Contents



D-Lib Magazine
April 2006

Volume 12 Number 4

ISSN 1082-9873

The Development of a Local Thesaurus to Improve Access to the Anthropological Collections of the American Museum of Natural History


Kevin L. De Vorsey
Dr. Christina Elson
Nina P. Gregorev
John Hansen
American Museum of Natural History
{devorsey, celson, grigri, jshansen};

Red Line



The anthropology collection of the Division of Anthropology at the American Museum of Natural History (AMNH) has been systematically acquired over the past 136 years by many of the founders of the discipline, including Franz Boaz and Margaret Mead. Currently it is represented by over 500,000 catalog entries that were created at the same time artifacts were processed and accessioned. Information about each artifact was recorded by hand in bound ledger books. While the introduction of a computerized collections database at the AMNH has dramatically improved access to cataloging information, the absence of controlled vocabularies to describe anthropological collections, the cultures that created them, and their places of origin has hindered the ability of both staff and scholars to conduct research. This absence can be traced to the traditional museum idea that each object is "unique", compared to books, which are assumed to be identical within an edition and can therefore be described with standard bibliographic entries. This article describes a project begun in the spring of 2004 that set out to address these problems through the creation of a searchable, hierarchical thesaurus of terms linked to the individual object records located on the division's website at <>. The ultimate goal was not to replace the existing database search mechanisms, but instead to provide an additional access point that allows users to virtually browse through the collections. This eliminates the need for researchers to be familiar with the inconsistent and sometimes archaic terminology originally used to describe the artifacts, and we expect that the thesaurus will enable new avenues of research to be pursued by a much wider audience than previously possible.

History and Organization of the Collection

The philosophy of collections management at the AMNH tightly combines preservation and access. The terms are virtually synonymous when describing projects undertaken during the last twenty-five years to re-house, and more recently, digitally image a collection of approximately one million artifacts representing every period of human history and hundreds of distinct cultures (Table 1). Re-housing projects have better protected the artifacts from pests and environmental threats by moving collections from small, dangerously over-crowded storerooms without temperature or humidity controls to modern compactor storage units located in secure, maintained, climate controlled facilities. At the same time the division has developed a collections management database that links an object's core cataloging information with digital images, related archival documentation and field photographs. The division's approach to the collection insures its safety and preservation while simultaneously increasing access for staff, researchers, and the members of the public.

Table 1

Anthropology Collection Statistics
Archaeology 329,000
Ethnology 188,000
Physical Anthropology 29,000
Total number of catalog records: 540,000

Prior to the introduction of a computerized database in the early 1980's, all access to the collection was provided via handwritten catalog entries. Each entry was written into one of the approximately fifty ledger books of varying sizes. Individual volumes might include thousands of entries from many different accessions (groups of objects acquired simultaneously). This placed obvious limitations on the research capability of the division, as only a single researcher or staff person at a time could access any volume, making concurrent research impossible. The physical limitations of the catalogs in combination with space and staffing challenges limited visits by researchers to about one a day or approximately 300 per year.

The division's collections are comprised of three general types: archaeological (excavated), ethnographic (material collected from living people), and biological materials (human remains). In most cases the archaeological and ethnographic collections are further sub-divided based on their geographic region of origin. For example, the North American archaeological collection is curatorially distinct from the North American ethnographic collection. North American ethnographic material is entered into the '50' catalog while North American archaeological material is entered into the '20' catalog. Individual objects, or in some cases groups of objects, are assigned sequential numbers (catalog numbers) in the order in which they were processed. Therefore, the catalog number 80.0/5583 describes an ethnographic adz collected from a Maori in New Zealand. This number is permanently marked onto the object and also into the appropriate ledger along with core descriptive information. (Illustration 1)

illustration of catalog record for an adz

Illustration 1. Maori Adz and its Catalog Record

For a larger view of this illustration, click here.

The fact that different staff members have made descriptive entries of the artifacts into the ledgers since the founding of the division in 1873, combined with the absence of anthropological cataloging standards analogous to the Dewey Decimal System or the Library of Congress Subject Headings, has resulted in a custom vernacular that requires a degree of familiarity on the part of the user in order to properly search for materials in the collection. This has limited expert searching to longtime staff or other specialists who are intimately familiar with the collection, its history, and those responsible for its development. The advent of the internet and especially the World Wide Web dramatically increased access to the collection databases and today some 10,000 people a year make use of them, a vast improvement over the 300 physical visits a year. Unfortunately, searching capabilities and the possibility of exhaustive database searches are hindered by the lack of a controlled vocabulary, and previous efforts to associate keywords to catalogue records proved only partially successful. In an attempt to address this problem, the authors of this paper decided in January 2004 to construct a poly-hierarchical, mono-lingual local thesaurus based upon the terminology used to catalog the collection, in combination with commonly accepted terms from a variety of sources to allow greater access to users around the globe.

Initial Development

Research was conducted prior to the development of the thesaurus to determine whether any existing thesauri or vocabulary could adequately describe the division's collection. While there are in existence a number of thesauri that describe art and even anthropological objects1, they are either unable to readily accommodate the broad scope of the AMNH collection or are no longer actively supported. We agreed that the Art & Architecture Thesaurus (AAT)2 by the Getty Research Institute provided the best theoretical model as it is oriented toward specificity and can be used as a descriptive, cataloging vocabulary, but we felt it lacked the high level of specificity required to accurately describe a comprehensive anthropology collection. The necessity for the AMNH thesaurus to handle terms with fluid or ambiguous meanings forced us to look beyond the AAT as it currently exists. Therefore, it was decided to create a local thesaurus based on the AAT, but designed specifically for the archaeological and ethnographic collections of AMNH. Because these collections are curated individually based on their geographic provenience by eight different curators, we determined that the best approach would be to start by developing the thesaurus for a relatively small collection, and then use it as a template to be expanded over time to include all other collections. Mesoamerican archaeology was selected because it is relatively small (approximately 30,000 catalog entries) and is well organized. But, much of the terminology used to describe artifacts is now outdated either in its spelling or meaning. This necessitated that curatorial and other staff input be sought to insure that the terms selected for inclusion, and the links made between the selected terms, were an accurate reflection of current archaeological terminology. Our collaborative approach also helped to insure that the resulting thesaurus would be a useful tool to a wide range of users, and not simply a theoretical exercise. The Curatorial Associate for the Mesoamerican archaeological collection acted as a subject specialist, the division's Collection Manager provided useful input based on his familiarity with the division's history and cataloging practices, and the Database Administrator integrated the thesaurus with the existing collections database, determined the appropriate functionality, and was responsible for the publication of the resulting application.

Structure and Creation

Thesauri are generally constructed in one direction, either from the broadest categories to the most specific or vice versa. However, as the broadest terms were derived from AAT and the narrowest from the existing AMNH keywords, this thesaurus was constructed from both ends simultaneously. The placement keywords proved complicated, as many objects can arguably be inserted in different facets of the AAT depending on whether emphasis is given to their ritual importance or their utilitarian roles. Ideally, the AMNH thesaurus will be poly-hierarchical with terms appearing wherever and whenever appropriate.

The AAT organizes information into seven facets: associated concepts, physical attributes, styles and periods, agents, activities, materials, and objects. Of these, the two most immediately pertinent to the anthropology collection in terms of retrieval are the materials and objects facets, and they are the only two to be fully realized at this point. The terms used for insertion under these facets were extracted from the keyword list contained in the collections management database. The list was cleaned to remove extraneous terms and to reduce phrases to their core meanings. 3" x 5" cards were labeled to represent the facets and hierarchies of the AAT as well as the keyword terms to allow staff to become visually familiar with the AAT structure and to determine under which facets and in which hierarchies AMNH terms logically belong. A freeware thesaurus construction program3, (Illustration 2) was used to enter terms, create scope notes, and make the appropriate relationships between terms based on the ANSI/NISO4 Z39.19 standard.

Screen shot of the W32 thesaurus construction program

Illustration 2. TheW32 Thesaurus Construction Program.

For a larger view of this illustration, click here.

Terms in the AMNH anthropology thesaurus tend to be either one or two words with the only phrases being the facets borrowed from the AAT such as "Materials by Formation Process" and "Adhesive by Composition or Origin". Because the thesaurus is intended to provide access to three dimensional objects and not just other concepts, the lowest level terms are nouns largely stripped of adverbs or adjectives, as these expressions exist elsewhere in the thesaurus or in another field in the database. For example, the phrase "painted effigy figurine" would be broken down to "figurine" and placed in the appropriate hierarchy for statues located under "Visual and Verbal Communication" while the descriptive terms "painted" and "effigy" are database fields.

The relationships between terms include broad term/narrow term (BT/NT), related terms (RT), use/use for (Use/UF) and scope notes (SN). As the pilot collection originated from Spanish speaking countries, it included many Spanish and native derived terms such as "machete" and "metate". In the case of "machete", it is considered part of the current vernacular and is retained, and while the word "metate" is well known to specialists in the field, the decision was made to use "grinding platform" as the preferred term to facilitate discovery by non-specialists. Scope notes defining ambiguous terms are added to further aid users who might otherwise be confused between closely related terms such as "adz" and "ax". Parenthetical descriptors are added for terms that appear in multiple branches of the hierarchy or that otherwise require additional clarification. For example, "cotton (textile" as opposed to "cotton (seed)". The authority for the selection of terms in the thesaurus resides with the curator for each collection while scope notes are usually definitions to differentiate between similar terms and are taken either from the AAT or the Oxford English Dictionary (OED) with the source indicated parenthetically at the end of each note. (Illustration 3)

Screen shot of the results window with Scope Note

Illustration 3. Results window with Scope Note.

For a larger view of this illustration, click here.


The pilot project confirmed the viability and ultimate utility of a thesaurus to describe the collection, and it was expanded to include all of the collections in the Anthropology Division. The resulting thesaurus was exported from the creation tool as an XML file (Illustration 4) and integrated into the division's collections management database. Synonymous terms are linked programmatically so that when a term in the thesaurus is selected, any equivalent terms are displayed in the query results (Illustration 5). This reflects the assumption that users browsing the thesaurus are interested obtaining broad, exhaustive result sets as opposed to expert users who might conduct specific term searches. The thesaurus was first published on the division's website in June 2005. It remains under constant development, and as changes warrant, the online thesaurus is updated.

Screen shot of the XML export from the W32

Illustration 4. XML Export from TheW32.

For a larger view of this illustration, click here.


Screen shot of the query results with synonomous terms

Illustration 5. Query results with synonymous terms.

For a larger view of this illustration, click here.

The AMNH anthropology thesaurus currently includes approximately 10,000 terms. Of these about 1,000 represent upper level terms from the AAT while the remaining 9,000 represent the lower level terms that link to the individual object records. There are almost 900 scope notes (only terms in the bottom two levels currently have scope notes as upper level terms are defined in the AAT) and work is ongoing to expand their use.


The creation of data standards for use with anthropology collections has been an elusive goal for the museum community. Artifacts are and are assumed to be unique; however, this has hindered the establishment of data standards and controlled vocabularies and, in turn, the efforts of collections researchers. Building upon previous work undertaken to enhance the preservation and access of the AMNH collection, we have begun the process of building a local thesaurus. Here, we have described the basic steps we followed during this process and have shown the utility of this tool for users who are unfamiliar with the terminology originally used to catalog the collection. We believe that one of the strengths of the AMNH anthropology thesaurus is that it incorporates previously established anthropology keywords with the Art & Architecture Thesaurus of the Getty Research Institute resulting in a hierarchical structure that loosely follows the standard for the construction of a mono-lingual thesaurus. The AMNH anthropology thesaurus provides users with a visual way to access the collection and for the first time allows searches to be conducted across all of the division's collections, not just within a single geographic region. It is intended to augment existing search tools and to provide an additional access point to a large, diverse collection. We hope that it will achieve the goal of increasing access to the collection and that it will encourage the pursuit of new avenues of research by a far wider audience than previously possible.


The authors wish to thank the following for their advice and support during this project:

Dr. Murtha Baca of the Getty Research Institute, Los Angeles, CA.
Dr. Timothy Craven of the University of Western Ontario, London, Ontario, Canada.
Professor Deirdre Donohue of the Pratt Institute and the International Center for Photography, New York, NY.
Professor Alan Thomas of the Pratt Institute, New York, NY.


1. The two most pertinent examples of anthropological thesauri are the British Museum's Object Names Thesaurus (, and Harvard University's Peabody Museum Classification System (, based on Robert Chenhall's Nomenclature for museum cataloging: a system for classifying man-made objects.

2. Getty Research Institute. Art & Architecture Thesaurus Online. (

3. See (Craven, 2005).

4. NISO (National Information Standards Organization) (U.S.), ANSI (American National Standards Institute).


Blackaby, J. R., Greeno, P., Chenhall, R. G., & Nomenclature Committee. (1995). The revised nomenclature for museum cataloging: a revised and expanded version of Robert G. Chenall's system for classifying man-made objects. Walnut Creek, Calif.: AltaMira Press.

Chenhall, R. G. (1978). Nomenclature for museum cataloging: a system for classifying man-made objects. Nashville: American Association for State and Local History.

Craven, T. (2005). Tim Craven, Professor, Faculty of Information and Media Studies, The University of Western Ontario (Version TheW32): University of Western Ontario available at: <>.

Getty Research Institute. Art & architecture thesaurus online. <>.

National Information Standards Organization (U.S.), American National Standards Institute., & American National Standards Institute. (2005). Guidelines for the construction, format, and management of monolingual thesauri. Bethesda, Md.: NISO Press. <>.

Collections Data Management Section Working Party. The British Museum Objects Names Thesaurus. London, 1999. Ed. Tanya Szrajber. Thesaurus of object names used in the British Museum. Trustees of The British Museum. 11/24/2005 2005. <>.

Szrajber, Tanya. The British Museum Materials Thesaurus. London, 1997. Ed. Tanya Szrajber. Thesaurus of materials used in the British Museum. Trustees of The British Museum. 11/24/2005 2005. <>.

Copyright © 2006 Kevin L. De Vorsey, Christina Elson, Nina P. Gregorev, and John Hansen

Top | Contents
Search | Author Index | Title Index | Back Issues
Previous Article | In Brief
Home | E-mail the Editor


D-Lib Magazine Access Terms and Conditions