Articles
spacer

D-Lib Magazine
January 2002

Volume 8 Number 1

ISSN 1082-9873

Safekeeping

A Cooperative Approach to Building a Digital Preservation Resource

 

Hilary Berthon, Susan Thomas, Colin Webb
National Library of Australia
Canberra ACT 2600
[email protected]; [email protected]; [email protected];

Red Line

spacer

Abstract

In May 2001 a project commenced that aimed to build a distributed and permanent collection of digital resources from the field of digital preservation. All resources incorporated in this 'safekeeping' project have been selected from the PADI (Preserving Access to Digital Information) subject gateway database. This article describes the first phase of the safekeeping project that is being undertaken by the National Library of Australia, with funding from CLIR (Council on Library and Information Resources). This project aims to identify significant resources in digital preservation early in their lifecycle. It also aims to facilitate the cooperative development of a distributed network of 'safekept' material with resource owners, or parties nominated by them, providing long-term access to their material. We anticipate that a diversity of technical and organizational solutions will be employed within this project that relies on cooperation within the digital preservation community, rather than on formal agreements, to realize an asset of communal value. This article discusses some early findings and outcomes of the safekeeping project; however, a full assessment of this approach must await evaluation over an extended period.

Introduction

The PADI (Preserving Access to Digital Information) initiative aims to provide mechanisms that will help to ensure that information in digital form is managed with appropriate consideration for preservation and future access. Its website (http://www.nla.gov.au/padi) is a subject gateway to resources about digital preservation. Now that digital preservation has emerged, if not into maturity, at least into a kind of toddlerhood of trial and discovery, it has become platitudinous to assert that unless consideration is given to preservation, much of our heritage in digital form will be lost. But what of the papers, projects, policies, discussions and other 'documents', accessible through the PADI subject gateway, that record our evolving understanding of the challenges and solutions of digital preservation? Will tomorrow's digital heritage include any account of progress made towards keeping digital information accessible?

It was with the aim of facilitating the preservation of this record that the National Library of Australia's safekeeping project originated. In its unique role of selecting, describing and 'bringing together' digital preservation resources, PADI appeared to be well-positioned to provide a basis for such an undertaking.

Cooperation has characterized the development of the PADI subject gateway. This has been seen as one of the initiative's strengths—according to one user, "It is extremely valuable in sharing knowledge around the world in a rapidly changing field. It helps prevent individual programs from being isolated and falling behind on new developments." PADI receives advice and guidance from an Advisory Group, comprising experts in digital preservation from a number of countries. Following the launch of PadiUpdate (http://www.nla.gov.au/padiupdate/) in mid-2001, resources may now be entered onto the PADI database by registered users from all over the world.

Photo of NLA staff inputting data on PADI database

National Library of Australia staff, Gerard Clifton and Susan Thomas,
inputting data on the PADI database (NLA photo).

The safekeeping project—to build a library of digital preservation resources that will be accessible in the long term—extends this model of collaboration, relying on the application of many safekeeping strategies to form a distributed network of safekept material. It is founded on the understandings made between PADI and a range of resource owners (or their providers) with the latter indicating their intention to preserve access to their own networked resources.

Selection

A significant element of this project is the selection of resources for the PADI subject gateway that provides an initial step in the selection of resources for safekeeping. However, one of the most challenging and resource intensive parts of the safekeeping project has been making decisions about which material, of all that is contributed to the PADI database, will be of long-term interest or value.

Our 'highest significance' category includes documents that we consider to be seminal or which record a 'turning point' in thinking about digital preservation. This category includes resources such as the final report and recommendations of the US Task Force on Archiving of Digital Information published in 1996 and available through the RLG website, which provides a foundation for much subsequent work. We have also included resources that define or describe an important issue, approach, project or study; or which summarize or raise important issues in digital preservation. Finally, we have selected material that, while it will probably not be considered important in 10 - 20 years' time, we believe will have some ongoing interest for reference purposes as examples of approaches or opinions from a particular time. Material selected for safekeeping may therefore be less current than other material on PADI and may not reflect current practice in digital preservation.

PADI links to many types of resources. These include papers and articles; policies, strategies and guidelines; websites describing relevant projects, or organizations with an interest in digital preservation, and links to information about conferences, workshops and seminars. The PADI database also incorporates resources such as bibliographies, glossaries, discussion lists, journals and newsletters. An 'archive' of PADI's discussion list, padiforum-l, is accessible through the website. Some of these resources are dynamic; others are quite 'static'. Some rely heavily on external links, while others contain few or no links at all.

With 118 items, the broad category of 'articles' comprised the largest component of selected materials. Resources assigned to this category include handbooks, reviews, reports and conference papers, as well as journal articles pertaining to a variety of digital preservation issues. In addition to these materials, our selection incorporated a further 32 resources of the type 'policy, strategy or guidelines', 15 items documenting 'projects' or 'case studies', and five 'websites'. We even chose a small number of items that have printed versions, not because we believed that the information was in immediate danger of being lost, but because preserving access to digital resources is a key focus of our project.

Digital preservation topics included on PADI range from emulation and legal deposit to topics such as intellectual property rights management, persistent identification and preservation metadata. Only those resources that very closely relate to digital preservation have been selected for safekeeping, with those providing contextual information only being excluded.

PADI home page

PADI's home page.

The development of selection criteria and the selection of material for safekeeping have been carried out by National Library of Australia staff for the initial phase of this project. However, it is intended that future selections will be distributed, with participation from overseas partners. We see immense value in this kind of cooperative collection development and anticipate that 'peer' appraisal of resources will provide an important underpinning to commitments from safekeepers.

A decision to select material for preservation relies on both a knowledge of the value of an item to a whole collection as well as the technical considerations and costs associated with preserving access.1 It is in the former aspect of selection that our project is able to make a contribution. Flagging significant documents early in their life—when they are identified through selection for the PADI database—is a key element supporting their long-term accessibility.

Pilot study

Many of the issues that needed to be considered before our project started emerged during the course of a pilot study that commenced in September 2000. This study aimed to search for a feasible model for building a distributed digital preservation archive. Over the course of several months, a number of discussions were held with the owners of eight resources published outside Australia. (The National Library of Australia's Electronic Unit had agreed, in principle, to archive within PANDORA2 Australian resources selected for safekeeping). We are very grateful to those whose generous sharing of information about how safekeeping might be applied within their own organizational and technical environment proved immensely valuable in shaping the project and provided us with much encouragement to proceed. Our discussions with these pilot participants as well as the helpful guidance of PADI's Advisory Group helped us identify some of the issues that would need to be addressed in developing safekeeping understandings—aspects such as defining roles and responsibilities, developing a common understanding of the strategies involved in safekeeping and a model for commitment to safekeeping.

Roles and Responsibilities

Over the past years, the issue of who has the responsibility for preserving access to digital material has received considerable attention. Our project has adopted an open stance about which party should ideally assume this responsibility. We have tried to encourage owners to consider making arrangements for long-term access, whether by adopting strategies themselves, or entrusting the archiving of their resources to a third party. Our discussions with owners of resources have, in many instances, indicated that they have a strong interest in ensuring their material remains accessible.

In the course of our project, we have been encouraged to learn that raising the issue of long-term access with resource owners has been a catalyst for defining preservation responsibility. While in some cases this has involved negotiation within an institution—for example, a library and a research department—in others, it has led to discussions between institutions. This has occurred, for example, in cases of multiple ownership or where owners are unable to make their own long-term access provisions. We have also learned that many organizations already have well-developed digital preservation strategies and that relationships and mechanisms that will facilitate the capture and preservation of digital resources have already been formed.

Model for Cooperation

Another set of issues that early comments on our work raised were focussed on levels of commitment, trust and reliability. What kinds of agreements and arrangements would be necessary to support long-term access? Related to this was the role that the National Library of Australia would play. Our pilot study indicated that, while the intention of most was to provide long-term access to their material, some would be unwilling to make formal written agreements; another respondent indicated that it would be unnecessary. The model for cooperation adopted in this project is based on a shared understanding of the types of strategies needed to ensure long-term access and goodwill, rather than being reliant on formal written agreements. The NLA's role, in this model, is one of encouraging owners/publishers to make arrangements to preserve their resources and describing this networked 'library' of resources.

We chose to use the term 'safekeeping' in preference to 'archiving' to avoid confusion over the type of function that we were proposing to provide. Our discussions prior to the project indicated that many believe that a certified archive must incorporate a fail-safe mechanism for ensuring preservation if information providers fail to maintain access to their resources.3 Certified archiving agreements, in this view, must also specify when such fail-safe mechanisms would come into play. In this model, formal legally valid contractual agreements underpinned by a common understanding of archiving policy and practices would be required in order for the archive to be considered 'reliable'. At this stage, PADI does not propose to assume the role of fail-safe archive, to monitor compliance with safekeeping elements or to prescribe the strategies that need to be adopted by safekeepers.

While we follow with interest the progress towards implementations of such a 'reliable' archiving model, we are keen to test an alternative model. This model, built on a clear understanding of requirements for safekeeping, is, like the rest of PADI, based on cooperation—collaborating because of the mutual benefits of doing so. Safekeeping understandings made as part of our project are not intended as a replacement for 'secure' archiving arrangements, but our experience with PADI, and with other cooperative activities, has led us to believe that the safekeeping model shows sufficient promise to pursue. We were keen to explore this in the absence of more reliable arrangements, believing it could encourage publishers/owners to take at least an initial responsibility that could support more reliable archiving arrangements later.

The safekeeping model relies on the digital preservation community's recognition of the benefits of this cooperative endeavor, and this is perhaps assisted by the strong identification between the owners/publishers and the user community. Of the 170 resources selected for safekeeping in the first phase of our project, over half are published by libraries or library organizations. The next largest group of owners/publishers is the higher education institutions (16 percent). The remainder are published by government, e-journal publishers, private organizations, research organizations and independent scholars. Overlap between some of these groups made assignment to these categories difficult in some cases. Interestingly, responsibility for the safekeeping of 14 resources that were not originally produced within libraries has in fact been assumed by two such institutions. In some ways the owner/publisher group in our project represent a unique community, with not only a strong interest in contributing to the development of a library of safekept material, but in many instances with safekeeping functions closely allied with institutional strategic goals and often a strong appreciation of the nature of the challenge.

As noted above, the Safekeeping project is underpinned by a growing body of knowledge—both theoretical and practical—gained through studies and projects that have been conducted over the past couple of years.4 We anticipate that, under the umbrella of the safekeeping project, a range of solutions—both technical and organizational—will be employed and that this diversity will be advantageous to long-term access. The notion of a distributed network of archives is a widely accepted one5 and is fundamental to a number of recent novel initiatives such as the LOCKSS project6 and the Open Archives Initiative.7

We believe that the issues surrounding gaining commitment are among the trickiest to negotiate, and we will not know how feasible our model is until some years have elapsed. But we believe that such attempts are worth pursuing, not just because of the value of the resource that we could build together, but also because they go to the heart of what PADI aims to be—a global cooperation in digital preservation.

In addition to coordinating the safekeeping of digital preservation resources on PADI, the National Library of Australia (NLA) has offered to support safekeepers' preservation strategies by providing back-up storage. In no way is this intended to replace the long-term access arrangements being made by safekeepers—rather, it is intended to further assist resource owners by supporting 'distributed redundancy' as an element in their long-term preservation plans.

Common understanding of long-term access requirements

A shared understanding of what is to be achieved and, broadly, how it will be done is an important element of this initiative. While we did not wish to be prescriptive about strategies, our understandings need to be based on a common understanding of the types of strategies involved. As a way of encouraging discussion about preservation methods, we shared information about the types of strategies that the NLA was using—or intended to use—with each of the pilot study participants. We invited the participants to comment on how their institutions' strategies conformed or differed from these or any other methods they considered to be helpful.

Arising from these discussions and also from work undertaken within the NLA on these issues,8 we used the following elements to define our understanding of 'providing long-term access':

  1. Make regular back-up copies of the material, with copied data validated and copies stored on more than one medium and in more than one 'safe' location. Regularly refresh data.
  1. Ensure that metadata supporting discovery, use and management of the material is created, stored and maintained. The metadata for management should include information about resource types, file formats, software required for operation, and information about changes and processes that might have been applied to the resource and their effects.
  1. Ensure that persistent access to the resources is maintained even if their location changes.
  1. Ensure that procedures to overcome technological obsolescence are followed; that unacceptable changes following these procedures have not been generated; and that these procedures are recorded.
  1. Ensure that dynamic resources are copied according to a specified collection schedule and that 'saved' versions are distinguishable.
  1. Ensure that software required to access resources is maintained until long-term strategies (4) have been applied.
  1. Ensure that the permission of copyright owners has been obtained prior to copying.
  1. Ensure that, should the [responsible organization] be unable to continue to provide long-term access to the resource, that it hands over responsibility to some other body and notifies PADI of this action.
  1. Agree to the National Library of Australia recording and displaying information on its PADI website that long-term access arrangements for the resources have been made and point to the location of any 'archive'. In the case of websites, indicate which areas are to be safekept (if the website is not to be safekept in its entirety).

While in some instances we have encouraged safekeepers to consider preserving a record of significant changes to dynamic resources, we have left decisions such as frequency of capture to the safekeeper. Similarly, assessment of the significant properties of digital resources is left to the owner/publisher in this model.

Recording Information about Safekeeping

PadiUpdate is a shared database for inputting resources onto the PADI database. Safekeeping information is recorded in a number of fields in the PadiUpdate database that are currently available only to the system administrators at the National Library of Australia. This information includes the safekeeping status of the resource and the 'safekeeper'; the URL or other identifier that points to a safekept version of the resource; and a field for any additional information about the safekeeping of the resource, including the date of any safekeeping understanding or notes provided by safekeepers on their ability to fulfil the elements of the provision of long-term access outlined above. Resources for which safekeeping strategies have been put in place are displayed with a Safekept symbol symbol on the PADI website.

Outcomes

In some ways, attempting at this early stage to draw conclusions from a project that is focussed on long-term benefits might seem a fruitless exercise. However, there are some outcomes that we believe are worth mentioning.

As noted above, we have received indications that our project has already served as a catalyst for some to consider providing long-term access to their materials. In some instances, our communications with resource owners have precipitated negotiations with a third party to take on the safekeeper role; in others, it appears that they have initiated the process of defining responsibilities. We have also found institutions that already have well-developed strategies for preserving their digital resources. The safekeeping project has provided a vehicle for sharing information about preservation strategies. Many of the owners with whom we have communicated have offered information about further activities and resources related to digital preservation that, in turn, have been disseminated through PADI.

The safekeeping project has provided a practical test of some of the ideas and assumptions about digital preservation. This point, expressed in a number of our communications with resource owners was articulated by one respondent who said, 'Although I have been active in the field of digital preservation for several years, the issues it raises have never struck so close to me.'

Having selected a set of 'highly significant' material in digital preservation and set out to record details of owners' ability to preserve access, we should, theoretically, be in a position to assess how much of our record is 'at risk'. In fact, we have found that the process of developing safekeeping understandings is a slow one, and we are still far from concluding our first round of communications. In some instances, this is the result of the significant number of parties involved or the difficulty of identifying who might assume safekeeping responsibility. Often institutions need to do a considerable amount of work before they can undertake a safekeeping role. At the time of writing, only four months have elapsed since our initial contact with resource owners.

In mid-December 2001, of a total of 170 resources belonging to 70 owners, safekeeping arrangements have been made for 77 items. Plans have also been confirmed for the Australian resources to be archived on PANDORA, or to be safekept as part of the NLA's overall strategy for the long-term preservation of its website. For 21 of the 77 safekept documents, their owners have also taken up our offer to store an extra copy of each on the NLA's Digital Object Storage System.

With the remaining component of materials, safekeeping negotiations are in progress with 20 resource owners, whilst alternative safekeepers are being sought for a handful of items. Four resource owners have replied so far that they lack the appropriate infrastructures and funding to enable them to safekeep the selected materials, at least for the foreseeable future.

Further assessment of our model for building a distributed and safekept digital preservation 'library' will need to await evaluation over a much longer period.

Future Work

As an endeavor based on cooperation and goodwill, maintaining relationships with the community of safekeepers will be a crucial element of our project and will also be important in our ongoing monitoring and evaluation of the project. We are also keen to further explore, in our model of collaboration, possible 'natural' paths for transferring responsibility for preserving access for material for which owners cannot, or choose not to, assume a safekeeping role.

As PADI progresses into a second round of selection of material for safekeeping, we would like to explore ways of moving towards a more broad-based peer selection process with participation from partners around the world. We are also interested in examining ways of integrating selection of resources for safekeeping with the selection or review of material for PADI. So, for example, when a contributor adds a resource to the PADI database, it may be flagged as potentially having long-term significance, or a resource may be identified as having high significance when listed by PADI's service, highlighting significant new additions to the PADI site. Finally, we will be interested in observing how our safekeeping model interacts with other models and discovering what the primary driving forces behind successful maintenance of long-term access are.

Acknowledgements

The authors would like to acknowledge the financial support this project has received from the Council on Library and Information Resources (CLIR).

We would also like to acknowledge the valuable information and comments provided by participants in our pilot study and by members of PADI's International Advisory Group. We are particularly indebted to Neil Beagrie, Peter Hirtle and Don Waters for their comments provided prior to the commencement of this project; however, responsibility for interpretation and use of their assistance rests entirely with the authors.

Notes and References

[1] Russell, Kelly and Weinberger, Ellis. Cost Elements of Digital Preservation. CURL, 2000. Available at <http://www.leeds.ac.uk/cedars/colman/CIW01r.html>.

[2] PANDORA Archive: Preserving and Accessing Networked Documentary Resources of Australia. NLA, 2001. Available at <http://pandora.nla.gov.au/index.html>.

[3] Task Force on Archiving of Digital Information. Preserving Digital Information : Final Report and Recommendations. RLG, 1996. Available at <http://www.rlg.org/ArchTF>.

[4] E.g., Cedars: Curl Exemplars in Digital Archives Project. CURL, 2001. Available at <http://www.leeds.ac.uk/cedars/>; NEDLIB: Networked European Deposit Library. National Library of The Netherlands, 2000. Available at <http://www.kb.nl/coop/nedlib/>; Reference Model for an Open Archival Information System (OAIS) CCSDS 650.0-R-2, Red Book. CCSDS, 2001. Available at <http://www.ccsds.org/documents/pdf/CCSDS-650.0-R-2.pdf>.

[5] Preserving Digital Information: Final Report and Recommendations.

[6] LOCKSS. Stanford University Libraries, 2000. Available at <http://lockss.stanford.edu>.

[7] The Open Archives Initiative. OAI, 2001. Available at <http://www.openarchives.org>.

[8] Safeguarding Australia's Web Resources: Guidelines for Creators and Publishers. National Library of Australia, 2000. Available at <http://www.nla.gov.au/guidelines/2000/webresources.html>; Managing Web Resources for Persistent Access. NLA, 2001. Available at <http://www.nla.gov.au/guidelines/2000/persistence.html>.

(�orrected coding for link to PADI update on 8/31/05.)

Copyright 2002 National Library of Australia
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Back Issues
Previous article | Next article
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/january2002-berthon