Search   |   Back Issues   |   Author Index   |   Title Index   |   Contents

Articles

spacer

D-Lib Magazine
June 2006

Volume 12 Number 6

ISSN 1082-9873

Distributed Preservation in a National Context

NDIIPP at Mid-point

 

Abby Smith
<asmith@abbysmith.net>

Red Line

spacer

The Library of Congress (LC), under direction from the U. S. Congress, is leading the development of a distributed digital preservation network that is to include public- and private-sector content creators, distributors, stewardship organizations, and technology enterprises. This article will summarize the progress of that initiative – the National Digital Information Infrastructure and Preservation Program (NDIIPP) – five and a half years after its inception. Specific actions that address the development of LC's internal preservation infrastructure, as well as LC's participation in international efforts, will not be addressed. This article will focus instead on NDIIPP's strategic approach; first-round investments, achievements, and lessons learned; plans for second-round investments; and a look ahead at remaining challenges.

Background

In December 2000, recognizing that born-digital content of value to the nation is at risk of being lost to current and future generations, Congress created the National Digital Information Infrastructure and Preservation Program – NDIIPP. The Library of Congress was charged to create a plan that "should set forth a strategy for LC, in collaboration with other Federal and non-Federal entities, to identify a national network of libraries and other organizations with responsibilities for collecting digital materials that will provide access to and maintain those materials." In addition, the "program is a major undertaking to develop standards and a nationwide collection strategy to build a national repository of digital materials."

In the law and accompanying conference report, Congress made clear not only what to capture – materials of value and materials that are "at risk" – but also how to do so: in a way that is sustainable and legal. "In addition to developing this strategy, the plan shall set forth, in concert with the Copyright Office, the policies, protocols, and strategies for the long-term preservation of such materials, including the technological infrastructure required at the Library of Congress." Congress named specific government agencies and private-sector nonprofit groups LC should work with. They also indicated the need to find partners in the commercial and technical communities. "The information and technology industry that has created this new medium should be a contributing partner in addressing digital access and preservation issues inherent in the new digital information environment" [1].

The charge is breathtaking in scope. How much was this going to cost? Congressional authorization for NDIIPP included funds to plan ($5M outright) and implement the strategy ($20M after approval of the plan, an additional $75M available if matched with nonfederal dollars) over several years (the original time frame was 5 years, and has been extended to 10). From its inception, NDIIPP envisioned a strategy based on public-private partnerships, supported by both sectors, and functioning through a distributed network and technical architecture.

Such a strategy seemed a far-distant reality, if not a dream, at the beginning of the century. In 2001-02, during the planning phase that produced the NDIIPP strategic plan, "Preserving Our Digital Heritage" [2], the information landscape was characterized by a lack of preservation awareness in key sectors of digital production and distribution; little consensus on problems and how to solve them; few forums for resolution of these issues; and a formidable set of technical problems to solve before large-scale preservation of authentic and reliable digital content could be assured [3]. It was clear that the technical problems are matched in complexity by the policy and economic issues, including rights and restrictions, economic models to support preservation, and lack of clarity about what is important to collect and preserve. The scale of information production alone has made traditional library and archival models of selection unscalable. Not only library and archives, but media production companies, research institutions, publishers, and journalists have struggled to find their way to make a profit or achieve their mission in this volatile environment. Given all these factors, it became clear that any implementation strategy must be iterative, flexible, accessible, and transparent to all stakeholders. A key outcome would be to catalyze a public policy and funding environment that supports rather than stymies the long-term goals of preservation now and over time.

The NDIIPP plan proposes a national network of preservation partners who would collectively ensure long-term access to a rich body of digital content. LC's role is to facilitate the building of an infrastructure comprising two key components: the preservation network, that is, the institutions that would collect, preserve, and make available digital content of value to the nation; and the technical architecture, the technical components that enable preservation. LC would also play a central role in constructing an enabling infrastructure that would facilitate preservation, rather than generate obstacles to it. This level of infrastructure includes the policies, standards, protocols, and other operating rules that allow organizations to participate – equally, fairly, safely – in the preservation framework. The enabling infrastructure ensures transparency among peers and allows the network to learn as it goes.

First-round investments in technical architecture and the network of partners

The NDIIPP investment strategy targets funds in three key areas: preservation science; development of the technical architecture; and modeling and testing preservation strategies. Through these funded activities, the network and architecture would grow together, incrementally, each informing the development of the other.

The first round of investments in technical architecture included the DIGARCH program, a partnership with the National Science Foundation (NSF) to grant short-term research grants for preservation science research [4]; and the Archive Ingest and Handling Test (AIHT), a "stress test" of the bulk data transfer function that relied on four institutions with differing repository environments [5]. There is also ongoing work with the Los Alamos National Laboratory (LANL) to prototype standards-based ingest and transfer of digital content, including the value of MPEG21 DIDL for complex digital objects [6]. And LC is collaborating with the San Diego Supercomputer Center (SDSC) to test remote data storage services using digital images and harvested Web content.

The first round of funding in the network of preservation partners was intended to identify and recruit organizations with technical capacity and developed preservation strategies. Through a competitive selection process, eight consortial partners were funded to test their capacities and strategies, to encourage learning by doing, and to build trust among partners through shared problem-solving in technical infrastructure, content rights and restrictions, economic issues, and collection building. The eight consortia (comprising thirty-six institutions altogether) are collecting at-risk digital content in formats that include digital broadcast video, business records, social science data sets, Web sites, and geospatial data, among others. Well into the second year of the collaborations, the partners have been able to refine workflow and ingest procedures, deploy Web crawlers, work on format descriptions, draw up deposit agreements and sort through copyright and privacy issues, and further iterate collection development strategies [4]. Based on an up-front match, NDIIPP is also partnering with Portico, an archive for electronic journals, that is doing a "real-world" test of a tightly specified technology and business model [7].

"Learning by doing" is often just another way of saying "learning the hard way," and there have been some hard lessons. Among the most interesting are those of the network dynamics themselves. The partnerships are often forged between several organizations, including LC, with strong but distinctive business models, professional identities, and bureaucratic cultures. Finding common ground and shared vocabularies can be difficult initially, and that difficulty will likely be a feature of every new community of practice that comes into the network over time. But the efforts expended doing this are necessary. And a key strategy of NDIIPP's iterative approach is that all efforts, successful or not, can be valuable: knowledge is the desired outcome, not just bits preserved.

NDIIPP has engaged another crucially important and very distinctive content community – state libraries and archives. In a series of meetings attended by representatives from all 50 states, NDIIPP asked the states to identify their critical preservation issues, articulate the nature and urgency of those issues, and describe what strategies they use to address them [8].

Second-round investments in technical architecture and the network of partners

As a result of ongoing discussions with partners and digital preservation experts, NDIIPP has identified four critical areas of investment in the technical architecture:

  1. building a distributed storage platform to help preserving institutions attain redundant and geographically disbursed storage of digital materials at low cost;
  2. establishing protocols for preservation-quality data transfer;
  3. developing and testing tools and services for ingest, storage, metadata, and formats; and
  4. developing practices and standards for assessing the quality of preservation systems.

These needs form the basis of a call for further investment in the existing NDIIPP partners – that is, all those who have received NDIIPP awards in the preservation partnership, technical architecture, and preservation science research areas. This next round of NDIIPP investments will also include a second round of preservation research; and targeted awards to state and regional organizations that collect and preserve "at-risk" content of importance to state and local governments.

Enabling infrastructure and the dynamics of a distributed preservation network

The past five years have shown that the "real challenges" in digital preservation are not primarily technical or procedural: they are the policies, the politics, and the economic drivers of digital preservation that serve to divide stakeholders as often as they unite them in a common cause. It is no longer true, as it may have been in 2001, that content producers, distributors, and consumers do not understand the risk of data loss. Many of the key stakeholders, from archivists to publishers, film studios to software engineers, scientists to city water engineers, record company executives to real estate developers, are very worried about how loss of data could adversely affect them. But their interests in preservation at best overlap. Just as often they are in conflict, or appear to be in conflict, because they do not share common understandings of the value of that information – for whom, for how long, for what purpose.

Copyright

NDIIPP managers have focused from the beginning on strategies that would lead to cooperation and partnering among those otherwise inclined to avoid, distrust, and at times vituperate each other. Both public and private sectors alike know they must adapt quickly to the changes in current ownership and distribution models digital technology has imposed. With the Copyright Office, NDIIPP has established a working group comprising equal parts librarians and content producers to look at changing the laws governing copyright. The Section 108 Working Group is investigating the failures of that section of federal code addressing exceptions for libraries and archives with respect to preservation and access of digital content. Their recommendations are due to go forward to the Register of Copyrights early next year [9].

In addition, the next round of funding for NDIIPP will include special funding to encourage partnerships with commercial content and technology companies. A meeting of potential collaborators, held in April, indicated that creators and distributors in the moving image, gaming, still image and graphic arts, broadcast radio and television, and music industries all view digital data loss as potentially disastrous to their enterprises and are looking for ways that engagement with NDIIPP could help them. The integration of private-sector preservation efforts will no doubt create new stresses for the network. The socialization of all partners in the network, from specialized research laboratories to commercial companies, state governmental agencies, and the Library of Congress itself, remains a largely uncharted territory in the digital landscape. Developing a framework for common action, with defined roles and responsibilities among all participants, is a key goal for NDIIPP.

Economics of archiving

NDIIPP is investigating the economic driving forces behind decisions about whether to archive or not; what to archive; and how to do so. This means researching the various business models used by creators, distributors, and archivers to cover their costs. "Real-world" economic scenarios are difficult to identify at these early stages of development. In most industries we see adaptations of analog-world economic models, and they are struggling to succeed. Portico does have a real-world business model that NDIIPP will track as it seeks to find its niche in the marketplace. Portico's market, though, is narrow – publishers and consumers of scholarly journals. It is unclear if any lessons learned will have applicability to more porous and commercially volatile markets, such as the entertainment world, or those characterized by near monopolies by software firms, as is the case in text documents, word processing and office applications, geospatial data, and architectural design, to name a few. Economic modeling is intimately entwined with content types and collecting patterns.

Collection strategies

The national collection will be distributed, built by a large number of stewardship organizations equipped to identify, collect, preserve, and make accessible content of value to current and future users. But most digital publishers do not conveniently produce such readily "archivable" objects as books, maps, manuscript pages, or contact sheets. Concepts of selection, curation, and preservation interventions in the digital realm all require complete rethinks. Among strategies being tested by NDIIPP partners are "bag and tag" – capturing digital creatures found in the wild without taking time to fully domesticate them to our current systems; and "mull and cull" – selecting and collecting by using or modifying the carefully articulated collection development strategies for print and commercial analog products such as feature films. Which materials are suitable for which approach? Can we keep costs down so that money is not the critical decision factor about what to collect? While current thinking holds that influencing activities upstream from the collecting point will lower the costs of preservation, maximize our chances of ensuring self-description and authenticity, and create value, no one has a formula for that. Approaches will no doubt be developed differently among various content communities. NDIIPP will document strategies that prove successful.

Remaining challenges

We have learned that simple operations can be hard – AIHT proved that with bulk data transfers – and complex negotiations among partners even harder – witness the sessions of the Section 108 Working Group. We also know that, despite this, most people who signed up for this work think it is worthwhile to pursue the elusive goal of secure, authentic digital preservation. It is elusive because we will not be around in the future to see if our long-terms goals have been met. So, we conclude, there must be identifiable, short-term, immediate, and locally felt benefits to keep organizations in the game. The organizations will identify those benefits for themselves. For the time being, this may be as close as we can come to "enlightened self-interest." But we have very much more to learn about the dynamics of recruiting organizations into the network and keeping them there over time. Some rewards may be financial; others will be prestige and reputation; still others will be survival-driven.

Staying the course

If the preservation network is to grow and strengthen, it will need to make longer-term investments than are presently possible. There is a tension between short-term funding constraints and long-term goals. We think we know that in digital preservation, we should plan for "the hand-off" from one responsible entity to another, not base our plans on preserving things for an arbitrary period of time (say, 100 years or 500 years). We may indeed gain traction by talking about the hand-off between generations, yet we are still struggling with irreconcilable time horizons of the long-term thinking required by networking building, and the short-term time horizons of data creators, software and hardware developers, and funders. Most importantly, we in the digital library community need to make a case for digital preservation, both directly to Congress and directly to those whom they serve – the general public.

Making the case

Making the case for public investment in preservation is, in the end, the Grand Challenge for us as professionals, and that challenge is political, social, and moral. Preservation must survive in an intensely political and commercial culture. It cannot do so, as some have advocated, by defining all of its activities and services in terms of business solutions; or even, as many have done in the wake of hurricanes Katrina and Rita, to make the case to policy makers that preservation is a form of business continuity. It is not, though preservation is usually key to reestablishing business after natural and man-made catastrophes. To declare preservation to be a political, copyright, or economic problem is not to denigrate or belittle it. It is the beginning of our work, as preservationists and digital librarians, to "proof" digital content with long-term value against a culture that has a short time horizon and a lamentable record of husbanding its resources. For digital librarians, "culture proofing" digital content with long-term value also means knowing when and how to make trade-offs in the face of multiple risks and rewards – on standards, on selection, on rights. Others who are working hard to insert a long-term view into the policy frame include climatologists, medical researchers, environmentalists, and energy engineers, to name a few. We have much to learn from them.

Providing leadership

LC is uniquely positioned to frame the debate surrounding the public interest, and that role is clearly what Congress had in mind when it spoke of the need for LC to engage all sectors of the information world in securing long-term access to our digital heritage. The need for that role has become ever more urgent as we look back over the last five years and see increased constraints on public-interest spending and the emerging dominance of commercial information enterprises on the Web. What are we to make of this? The highly divergent views within the digital library world of Google's announcement about plans to digitize and make available the contents of some research libraries exemplifies the difficulties we have in sorting out the meaning of this trend.

NDIIPP will succeed in large measure to the extent that individual and local activities of each participating institution add up to a whole that is greater than a sum of their parts. It is the special role of the Library of Congress to facilitate constant scoping, measuring, assessing and reassessing, encouraging experimentation for the sake of learning and holding all accountable for what they know. The Library will grow as one node on the network, with distinctive collections and services. But it will also be called upon to represent the public interest in ensuring long-term access to our digital heritage, and thus has a singularly important role in building and maintaining trust among preservation institutions that choose to cooperate and, over time, become interdependent in small ways and large.

NDIIPP is currently authorized through 2010. The goals for 2010 include the articulation of a national collection and preservation strategy; agreed protocols and standards for the technical architecture; an expanding network of partners committed to long-term participation; several well documented models for public-private partnerships; a suite of recommendations to Congress to emend the copyright law; and models for governance of the National Digital Preservation Network going forward. But there is no expectation that the activities of NDIIPP will cease then, or in the future. We can safely predict that the program will be at a juncture in 2010 as unpredictable today as the present moment was unpredictable in December 2000.

Acknowledgements

I wish to thank those who reviewed this article at various stages of drafting and editing: Martha Anderson, Laura Campbell, Guy Lamolinara, William Lefurgy, and Martha Rasenberger. They have provided information and kept me from committing many errors, large and small. Those that remain are mine and mine alone. So, too, are the opinions and interpretations herein, which do not necessarily reflect those of the Library of Congress or the federal government.

Notes

[1] See PL 106-554 and Conference Report 106th Congress (H. Report 106-1033), H. R. 4577, Chapter 9. For a full description of the planning process, see Amy Friedlander's article in D-Lib Magazine's April 2002 issue, "National Digital Information Infrastructure Preservation Program: Expectations, Realities, Choices and Progress to Date," [sic] at: <doi:10.1045/april2002-friedlander>.

[2] About the Digital Preservation Program. Available at: <http://www.digitalpreservation.gov/about/planning.html>.

[3] "Planning for the National Digital Information Infrastructure and Preservation Program," draft, January 2002, pp. 12-13.

[4] Library of Congress Digital Preservation Partnerships. Available at: <http://www.digitalpreservation.gov/partners/research.html>.

[5] Library of Congress Digital Preservation Technical Infrastructure, <http://www.digitalpreservation.gov/technical/aiht.html>; see also the reports by several participants in the December 2005 issue of D-Lib Magazine at <doi:10.1045/december2005-contents>.

[6] MPEG-21 Part 2: Digital Item Declaration Language (DIDL). Available at: <http://xml.coverpages.org/mpeg21-didl.html>.

[7] About Portico, <http://www.portico.org/about/>; and <http://www.digitalpreservation.gov/partners/ejournals.html>.

[8] Library of Congress Digital Preservation Partnerships - States, <http://www.digitalpreservation.gov/partners/states.html>.

[9] For details, see <http://www.loc.gov/section108/>.

Copyright © 2006 Abby Smith
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Back Issues
Letters | Editorial
Next Article
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

doi:10.1045/june2006-smith