The National Gallery of the Spoken Word (NGSW) is a Digital Library Initiative-funded project based at Michigan State University with multiple internal and external partners. The NGSW is essentially a multicultural enterprise because of the variety of disciplines involved, each of which has a unique micro-culture and mutually-unintelligible specialized language. This article uses an ethnographic approach to describe three NGSW-based research projects: copyright, metadata, and digital preservation. Each of these projects shows some aspect of language-related infrastructure development.
Ethnographers are more and more like the Cree hunter who (the story goes) came to Montreal to testify in court concerning the fate of his hunting bands in the new James Bay hydroelectric scheme. He would describe his way of life. But when administered the oath he hesitated: "I'm not sure I can tell the truth � I can only tell what I know." -- Clifford, 1986, P. 8
Like the Cree hunter, I cannot tell the truth about the National Gallery of the Spoken Word (NGSW), I can only tell what I know. In a forthcoming article for Library Trends (Seadle, 2000) I describe several of its founding myths, and offer some explanation of the anthropological theory which stands behind my approach to research in and on this grant. I also examine some of the goals of the many partners involved. In this article I look chiefly at the research issues, particularly those affecting libraries.
Some brief description of this large, multi-disciplinary grant is necessary. The NGSW's partners come from four different units on the Michigan State University campus: the College of Engineering, the College of Education, Matrix (a technology research center in the College of Arts and Letters), and the University Library. External partners include the University of Colorado (Boulder), Northwestern University, and the Chicago Historical Society. The original official letter-of-intent proposed five key points (DLI2, April 14, 1998):
1. Founding a National Gallery of the Spoken Word analogous to the National Portrait Gallery for publicly available materials.
My approach is ethnographic, and in good ethnographic tradition I occasionally and deliberately use the first person singular to remind readers of the source of these observations. Where possible I will cite documents in an ordinary scholarly manner. When dealing with oral sources or private email, I will generally not name people even to the extent that I have in oral histories. Doing so would be unfair, even unethical, since observation as an insider is more invasive than an oral history, and gives those observed no chance to protect their privacy or reputations.
I put particular emphasis on language, and on meaning in the sense in which Clifford Geertz uses it. (Geertz, 1995, p. 114) The NGSW is essentially a multicultural enterprise, because of the variety of disciplines involved, each of which has a unique micro-culture and mutually-unintelligible specialized language or jargon. As is true with macro-cultures, the group identities and languages split along different boundaries. The disciplines include education, history, political science, engineering, and library science. The jargons include engineering (understood only by the engineers), education (understood by both education and Matrix), computing (understood by particular Matrix and library staff), and library science (used mainly by librarians but broadly comprehensible to Matrix people). I speak only the latter two jargons, and am conscious of having to rely on translations for the others.
Language can unify as well as divide. The struggle to achieve a shared meaning for key terms, concepts, or codes indicates a healthy attempt to build intellectual infrastructure. That struggle does not always begin across disciplinary cultures or jargons. Librarians may, for example, encounter a new concept (e.g., Encoded Archival Description, known as EAD), work to understand it among themselves, find that computer people face the same problem, and cooperate with them on shaping a shared meaning among team members. External dimensions matter too. Everyone within the project team may agree on a term (e.g., digital preservation), but encounter important outside interest groups whose negative connotations clash with their own positive inclinations. The result can unify a team, spark a public debate, even change minds.
In this article, I describe three NGSW-based research projects: copyright, metadata, and digital preservation. Each of these shows some aspect of language-related infrastructure development, which I see as a key link between internal dynamics and research results. Since the project has just finished its first year, most of the definitions, agreements, and shared understandings remain somewhat incomplete.
The first language question had to do with how the project team understood "research." One new project member with extensive experience on other grant-funded digital projects spent a week or so reading and re-reading the project proposal.  When asked why, she explained that she assumed the proposal was a kind of "contract" with the granting agency. The assumption was perfectly reasonable. In many cases it is. But the Digital Library Initiative (DLI) request for proposals (RFP) deliberately challenged authors not to limit themselves to existing capabilities. One example in the RFP suggested creating "high-risk, 'breakthrough' applications capable of providing new conceptual paradigms for information technologies and altering social and work practices on a grand scale."  That is not the kind of activity which lends itself to pre-developed blueprints. The NGSW proposal fit the model in offering a broad, bold, and inherently risky initiative to push the envelope of digital library research. The specifics about how to achieve this were few.
The proposal also underwent verbal revisions at a hearing where the review panel interviewed three of the four principal investigators and some allied staff. I recall two topics of particular concern: methods for searching the digital sound and copyright. Metadata development in conjunction with engineering algorithms helped answer the search question. References to specific exemptions in the law addressed copyright concerns. These answers modified any sense in which the proposal might have been a contract. Unfortunately no transcript of that meeting exists, and notes and memories vary. Since representatives of different disciplines had to answer quickly, without consultation or explanation, they emerged with no clear consensus on what precisely the team as a whole had promised, and with no text to consult.
Although the project team quickly agreed that the goal of the grant was "research," some team members felt that the main focus lay in scale -- in setting up a Web-site with as much digital spoken-word sound as possible. Others felt that it emphasized research into how to carry out the various elements necessary to build the gallery: the digitization, the metadata, the sound searching, the interface. Still others felt the users were key, and that the grant's chief commitment lay in producing something useful to a wide range of students and scholars. Understanding tended to split along disciplinary lines. The engineers thought in terms of mathematical models, small test datasets, and peer-reviewed publication. The educators thought in terms of interfaces and classroom feedback. The librarians thought in terms of collection building, content description, and quantity. The computing people thought about bandwidth, indexing, and making it all work. At the publication level, the differences seemed obvious. For example, articles appropriate for highly respectable library journals would be totally unacceptable to a peer-reviewed engineering journal, just as a brilliantly mathematical engineering article might well seem irrelevant to most library readers.
No solution to defining the nature of the NGSW research could ignore these facts of life in different academic disciplines. Only a loose, multi-faceted, federal, or even con-federal, philosophy had a chance of success. At this point no one knows whether such a compromise represents intellectual cowardice in the face of conflict, or a sensible means of channeling creative interactions. Like an international treaty, the NGSW's intellectual infrastructure recognizes the validity of multiple approaches, and at least has allowed specific research areas to move forward independently.
Having any significant amount of digital sound to share through the NGSW depends on copyright, and understanding copyright hinges on the meaning of a complex set of legal texts. The problem is that no large, safe mass of pre-twentieth-century, public domain sound exists to put with impunity on the Web. Most recordings fall into dates where some part of them, mainly the words, either could be protected (1923-78 for US materials) or definitely are protected (since 1978). This means that the vast majority of the contents of the NGSW have, or could have, copyright protection. Initially the research focused on examining copyright law to determine what exceptions could be used on the Web. One obvious exception was the oral equivalent of Federal government documents, which automatically fall into public domain regardless of their creation date. But even these sources would not stretch to cover the range of topics envisioned in the grant. Only the engineers could view copyright with relative dispassion, since they needed only relatively small datasets for their work.
A debate began within the grant over how to understand both copyright in general and the right of "fair use" in particular. Both "copyright" and "fair use" have precise legal meanings. Title 17 of the United States Code defines copyright in general, and 17 USC 107 defines fair use in particular. The latter includes four tests. Two are generally easy for academic applications: one considers the purpose of the use, and another the nature of the work. Either teaching or research satisfies the first, and non-fiction generally satisfies the second. Two other tests are harder: one looks at the "amount and substantiality" of the portion copied, and another looks at the effect on the market. Failing any of these can (though not necessarily will) turn a court decision against fair use. Part of the team, mainly librarians, argued for the official follow-the-law policy of Michigan State University (MSU). They were willing to interpret the law as liberally as institutional policy allowed, but not to depart from explicit strictures.
Copyright and fair use also have popular meanings. Some in the NGSW, particularly teaching faculty, pushed for an implicit educational imperative that outweighs legal details. One person (over wine at a conference dinner) even argued for a readiness to infringe openly until stopped by a formal warning. Most simply wanted a flexible fair use policy that seemed "fair" to them and did not give away rights they felt they had long had in face-to-face classroom teaching. Lawyer-librarian Kenneth Crews echoed a less extreme version of these sentiments in criticizing the copyright policies of American universities:
University copyright standards are not shaped by the relationship between law and the use of copyrighted materials for teaching, research, and service. They are instead overwhelmingly influenced by "model policies" that offer quick answers and some promise of a "safe harbor" from liability. Excess concern for avoiding infringements outweighs the institution's academic mission. -- Crews, 1993, p. 115
Crews is not alone. In a post-session discussion at the June 2000 Coalition for Networked Information meeting in Stratford upon Avon, England, several computer science colleagues argued that, since the law tends to evolve from practice, online resource developers should not abandon the Internet's long-cherished openness.
In practical terms, the more cautious approach has won, if only because those who would ultimately have to deal with the legal consequences have supported it. The dissenters remain as unconvinced as ever, and some external partners may adopt different policies -- at least until their administrations take note. But fairly strict legal interpretations have become the common currency of most NGSW copyright discussions.
Research in this area currently includes the following elements:
First, what is actually protected? Spoken word recordings can have both copyright protection for the underlying text and (since 1972) for the sound recording itself. Earlier state protection for the sound recording was also possible. Under the terms of the 1909 Copyright law (in effect until 1978) a work had to be registered with the Copyright Office and had to carry a notice of copyright. Otherwise it fell immediately into public domain. The 1963 Mr. Maestro decision and its 5 November 1999 reversal in circuit court show some of the complications in applying the 1909 law's legal requirements to recordings of live speeches, which inherently could not be registered before dissemination, because they did not exist in final form until then. It seems likely that many speakers did not bother, and that a portion of those who bothered would not have renewed their copyright after the expiration of the first 28 year term. It could be that only a small percentage of pre-1978 speeches actually have protection today. The evidence lies in paper registration and renewal records at the Library of Congress, and a project is being designed to sample these records to give some statistical basis for these speculations. Of course, if the research showed only a small probability that, for example, radio broadcasts of political stump speeches were ever registered, it would still not guarantee anything about a particular speech. A failed search for a specific copyright record only reduces liability through a good-faith effort. Paper records notoriously get out of order easily, and the rights holder might have kept the receipt. Certainty is rare in copyright.
Second, who are the rights holders? The problem of uncertainty applies to them too. The NGSW uses the services of the Digital Sources Center at the MSU Libraries to track down rights holders and to get Web-publication permissions. Multiple owners can be involved, including each speaker rights in the recording itself. Significant research can be necessary to identify them, to learn which rights they own, and to find out how to contact them. It is not unusual for people to grant rights which they may not actually have, or to insist that they do not have rights which other evidence suggests belong to them. Even corporate rights holders appear sometimes to guess about ownership. While this can be convenient if it results in a permission, it raises doubts about reliability. A written, signed permission does not represent certainty -- only good faith.
Third, what are acceptable kinds of permissions limitations? NGSW normally requests a free, unlimited, permanent, non-exclusive permission, but not all rights holders will go so far. When a permission is granted, it is especially important that both requestor and grantor understand the meaning of those permissions in terms of the computer implementation. The interface can, for example, easily limit access by Internet address range, which could mean several institutions, a single institution, or a sub-unit with its own address range (e.g., the library). But any member of the public could circumvent an intention to allow, for example, MSU-only rights by using certain computers in the Main Library. Limiting by Internet address range can also cause inadvertent exclusions: for example, students dialing in from a commercial Internet Service Provider, unless they first authenticated themselves through a special proxy server. Commercial database vendors understand these meanings reasonably well, but individual rights holders may not. Other reasonable conditions would require special programming: A maximum number of uses in a given period, for example. In Washington last year a lawyer-librarian gave me an off-hand, off-the-record opinion that such a mechanism might be interpreted as satisfying the clauses in 17 USC 108 dealing with a library's right to make a "limited" number of copies of news broadcasts.
Fourth, what fits the statutory definition of "fair use" for an NGSW sound file? Many of the tests in 17 USC 107 are manageable. The purpose and character of the use is straightforwardly academic: both scholarly and educational. The nature of the material is mainly informative and almost exclusively non-fiction and non-music. And the NGSW could avoid undercutting the market for a work by excluding protected materials that are currently being sold. But the test that limits the amount and substantiality of the portion used is harder to evaluate. One approach is to set time limits as a percent of the whole file. The Music Library Association guidelines recommend limits of "up to 10% or 3 minutes" for "motion media," "10% or 1000 words" for text, and "up to 10%, but in no event more than 30 seconds" for music. (Music, 1996)
Metadata is itself a form of language: a vehicle for communicating information that establishes syntax, vocabulary and meaning, in a way that both humans and (perhaps more importantly) computers can understand. NGSW's library and computing communities both care about its descriptive capacity, and its precise, standardized structures that facilitate sharing. Both care about using open, non-proprietary standards. Differences remain, of course. The librarians still care first about the verbal descriptive elements, while the computing professionals prefer encoded options. The engineers and educators think about metadata mainly as an information source. For the former this includes noise levels, the age and sex of speaker, the language used, and the number of speakers. For the latter, it includes speaker names, the date of the speech, and the topics discussed.
From the outset the NGSW faced two critical metadata decisions:
Choosing a format was an unplanned problem. The original proposal had assumed MARC (MAchine Readable Cataloging) would be the metadata for describing sound files. The library already had thousands of MARC records, and processes in place for adding more. But the assumption that every sound file, even a short thirty second clip, would be treated as if it were a monograph, made increasingly little sense when all the costs were considered. As a result of discussions at the first DLI meeting in Ithaca, NY, Lisa Robinson argued for the advantages of Encoded Archival Description (EAD). First, it was an open standard designed to group and describe amorphous resources like voice recordings. Second, it could save time and cost by providing collection level descriptions, instead of the item by item descriptions. And third, its principle of inheritance meant that speaker or copyright information need only be recorded once to apply to all lower levels (where it could be overridden if necessary). The discussions lasted for half a year, and an article about them is underway. In the end, however, EAD became an integral part of the NGSW intellectual infrastructure.
Agreement on EAD provided a syntax, but neither vocabulary nor meaning. One important compromise was that collection level information would be loaded into the library's Innovative Interfaces OPAC. This meant that Library of Congress Subject Headings (LCSH) plus some form of author, title, and publication information needed inclusion. LCSH was straightforward, but the newly organized sound collections had no inherent title, and sometimes lacked an obvious main speaker to serve as author. Rules had to be developed about how to name collections, and whether individual sound files could be referred to by date alone, since most had no title. A decision to group sound files by main speaker whenever possible provided an author, but left questions about multi-speaker situations. For example, who was more important in an interview: the interviewer or the interviewee? A good knowledge of history turned out to be essential.
These decisions provided some of the contents, but EAD was also the means for encoding information that the engineers could use for speech models for their search engines. The types of information included age, sex, language, and regional accents. Acoustic conditions have also become important: did the recording take place in a large hall or a small room? Did the speakers have individual microphones, or were they sharing a single microphone? Was there background noise like clapping, planes landing, people talking, phones ringing? Codes and descriptive phrases need to be developed for these and other conditions. As with natural languages, the vocabulary seems always to grow.
Another form of content came from copyright decisions and permissions. EAD already had an "<accessrestrict>" tag where copyright information could logically go. Some of this vocabulary exists in prototype form: for example, "1" has become the code for "public domain," and "2" refers to the 17 USC 108 broadcast news exemption. But the current codes are ill adapted to external use. Many have purely internal processing implications, such as "6 -- status unknown, ask Michael." And none of the codes refer to kinds of permissions that rights holders might grant.
Although building this kind of metadata vocabulary is vital, it is also a hindrance because none of the EAD records can be considered finished until these contents are in place. The NGSW already has over 1500 EAD records, all of which will need some revision for copyright codes, for acoustic conditions, for accents, and for file locations. A permanent online location and naming scheme is waiting on better definition of the storage solution. Once that is done, there will be strong pressure from the educators to update each record with the URLs. If the other codes are not ready, a second round of updates might be necessary. Or even a third. And each update is expensive. This slow vocabulary-building process is one of the hazards of a multi-part research project. Developments do not always come in coordinated quanta, and communication has comprehension lags. For example, I failed to appreciate the significance of what the engineers were saying about acoustic information until recently. In retrospect, I probably conflated it with linguistic and accent issues, because they made more sense to my non-engineering training and background.
Understanding Digital Preservation
Digital preservation was the one area where the NGSW team had an early, easy, and (for all practical purposes) complete agreement about meaning and direction. Converting analog sound to digital formats was integral to everything else that had been planned. The intent was to establish early agreement on the types of equipment, the sampling rates, the file formats, and the metadata in headers to have reliable specifications to hand out to future NGSW partners, and to save everyone else from re-inventing the standard digital sound file. The proposal's language about developing "speech digitization standards" had a modest evangelical flavor, and converting the pro-analog pagans was one of my tasks. But I soon discovered how much controversy the phrase "digital preservation" aroused:
Though digitization is sometimes loosely referred to as preservation, it is clear that, so far, digital resources are at their best when facilitating access to information and weakest when assigned the traditional library responsibility of preservation. -- Smith, 1999
Initially I merely disagreed. After conversations with Smith and others, I came to recognize that preservation meant very different things to different people.
My own background in library preservation involved the Cornell CLASS project, which worked to establish standards for the preservation of print materials in digital image form. My interest in the original artifact has always been modest, and the fact that Cornell's digitization process disbound books did not trouble me. The intellectual content was what mattered, and the same was true for the engineers, linguists, and computing people on the NGSW team.
But to many in the library world, the original artifact is the real point of preservation. They have seen the power of books and (acid free) paper to survive the centuries, and they have observed the rapid mutation of computing standards that renders word-processing files from half a decade ago unreadable. Talk about media migration and refreshing formats has a magical, unreliable, and distinctly expensive ring. The fact that a digital copy of a digital work suffers no loss of quality seems like poor compensation for what they see as the risks involved. Such doubts represent something more sophisticated than neo-Luddism. Digital reformatting discards more than it saves. The artifact's smell is gone. Its feel is gone. The chemicals and molds that slowly undermine the original are gone. Each of those offers some information to the right sort of scholar. Worst of all, the conventional means of establishing authenticity based on the physical original is gone. And even those ready to forego feel and smell agree that authenticity matters.
Last winter an authenticity case occurred in the NGSW, when a user's question about a digital sound sample led to the discovery that a notable recording was a fake. The sound clip purported to be a copy of an Edison wax cylinder of the ringing of the bells of Big Ben in London at New Year's Eve in 1900, but it turned out that the donor had re-mastered an older (16 July 1890) Big Ben recording, with a voice-over announcing the false time and date. The detective work was easy once suspicion had been aroused, and the fake promptly withdrawn. Although the digital version triggered the alarm, the authenticity problem long predated any digital versions. The fake had been made in the 1950s or 60s on analog tape, using analog tape input from the earlier recording. No one had ever checked for the true original. The extreme fragility of Edison wax cylinders had led everyone to assume that it had long since fallen apart.
After this case, and after reading Authenticity in a Digital Environment (CLIR May 2000), my own understanding of digital preservation has expanded beyond the confines of reformatting to include issues of authenticity -- both before and after the creation of the digital version. Happily, watermarking the digital sound had long been an engineering project within the NGSW. For non-engineers the original interest in watermarking lay chiefly in being able to identify stolen copies for concerned copyright holders. Increasingly now team members are considering its potential in distinguishing originals from fakes.
The NGSW seems unlikely to lose its commitment to digital reformatting as the basis of its preservation efforts. The fact is that sound files on analog tape degrade slowly but steadily from the moment of recording. Analog to analog copies also lose quality. And tape is a far less robust medium than paper. Some types of tape from the 1970s suffer from "sticky shed," and can be rescued only with careful baking. Digital versions, for all their problems, at least have the potential for surviving unscathed.
Nonetheless, this encounter with outside opinion has affected the course of NGSW research by disrupting a comfortable but simplistic idea about digital preservation. And clearly, watermarking represents only part of the solution. Authenticity issues will undoubtedly become part of the discussions on metadata.
Those who have worked on similar multi-disciplinary projects might recognize the coffeehouse cacophony of intellectual discourse. Team members often seem to sit at different tables, shouting about different issues, using phrases that might as well be in Turkish or Chinese. The NGSW would probably have less energy if its true nature resembled the starched harmony we occasionally put on for public forums.
One benefit of these forced encounters with foreign jargons and external ideas is the growth of an intellectual infrastructure: a common language with common meanings for particular terms. Before the NGSW began, few team members had strong ideas about ordinary words like "copyright" or "preservation," and the term "metadata" seemed positively obscure. Each of these has acquired a substantial and ever growing meaning which, despite personal and discipline-based differences of interpretation, have allowed internal discourse. That matters. As Nobel Prize winning economist Herbert Simon wrote:
To make interesting scientific discoveries, you should acquire as many good friends as possible, who are as energetic, intelligent, and knowledgeable as they can be. Form partnerships with them whenever you can. Then sit back and relax. You will find that all the programs you need are stored in your friends, and will execute productively and creatively as long as you don't interfere too much. -- Simon, 1991, p. 387
As I see it, these human programs offer the best chances for real success. Building a National Gallery of the Spoken Word depends in more than one sense on people and the words they use.
 Available (October 2000):
Authenticity in a Digital Environment (May 2000), Washington, DC, Council on Library and Information Resources. Available (November 2000):
Clifford, James (1986), "Introduction: Partial Truths," in Writing Culture: The Poetics and Politics of Ethnography, ed. by James Clifford and George E. Marcus, Berkeley, University of California Press.
Crews, Kenneth D. (1993), Copyright, Fair Use, and the Challenge for Universities: Promoting the Progress of Higher Education, Chicago, University of Chicago Press.
Email to [email protected] (April 14, 1998 3:35 PM), "Letter of Intent for DLI2."
Geertz, Clifford (1995), After the Fact : Two Countries, Four Decades, One Anthropologist, Cambridge, MA., Harvard University Press.
Music Library Association (July 17, 1996), "Fair Use Guidelines for Educational Multimedia." Available (October 2000): <http://www.musiclibraryassoc.org/Copyright/guidemed.htm>.
Seadle, Michael (2000), "Project Ethnography: An Anthropological Approach to Assessing Digital Library Services," Library Trends, v. 49, no. 2 (forthcoming).
Simon, Herbert A. (1991), Models of my Life, N.Y., Basic Books.
Smith, Abby (February 1999), Why Digitize, Washington, DC, Council on Library and Information Resources. Available (November 2000):
Copyright© 2000 Michael Seadle