Clips & Pointers

Spacer

D-Lib Magazine
July/August 2007

Volume 13 Number 7/8

ISSN 1082-9873

In Brief


Spacer

Award-Winning TPAP Digital Preservation Prototype Keeps Growing

Contributed by:
Paul Tooby
Senior Science Writer
San Diego Supercomputer Center (SDSC)
University of California, San Diego
La Jolla, California, USA
<ptooby@sdsc.edu>

The Transcontinental Persistent Archives Prototype (TPAP) project, with six sites nationwide linked by data preservation technology developed at the San Diego Supercomputer Center at UC San Diego, is addressing key challenges in safeguarding, preserving, and providing access to authentic electronic records as the nation's information becomes increasingly digital.

A testbed for preserving electronic records collections from the National Archives and Records Administration (NARA) that must be maintained for "the life of the Republic," the TPAP project has added a sixth partner site at the U.S. Navy's Allegany Ballistics Laboratory near Keyser, West Virginia.

Along with SDSC in California and the new West Virginia site, the other project sites include two NARA sites in or near the nation's capital, the University of Maryland, and Georgia Tech.

A key aspect of the Transcontinental Persistent Archives Prototype is the collaborative nature of the research. "Extending this prototype to the Allegany Ballistics Laboratory in West Virginia applies advanced SDSC data preservation technology in association with the first deployment of the high performance, low latency, networking capabilities of the Department of Defense's Defense Research and Engineering Network ("DREN") in the state of West Virginia," said Robert Chadduck, principal technologist for NARA's Electronic Records Archives Program. "This materially advances the nation's window onto the electronic records archives of the future where shared knowledge can be managed and distributed across multiple institutions and platforms spanning the country...The capabilities being demonstrated in this extended testbed are essential to ensuring continuing access to electronic records that document our nation's history, our democratic processes, the rights of American citizens and our national experience."

The TPAP project, built on the SDSC Storage Resource Broker (SRB) data grid system, received an Internet2 Driving Exemplary Applications (IDEA) Award in 2006 for enabling transformational progress in digital preservation research.

The project's results are expected to be a major contribution to the ability to sustain a "memory" in digital form. With digital data growing exponentially across all sectors of society, the powerful freedoms it offers are accompanied by an array of threats, from the creeping incompatibility of obsolete hardware and software to data corruption, viruses, hard drive crashes, and a lack of tools able to organize, manage, and access this avalanche of data.

Today's high-end data collections are reaching petabyte size (one petabyte is one million gigabytes, the equivalent of 500 billion pages of printed text) with tens of millions of files, and these collections are expected to keep growing rapidly.

"The testbed uses SDSC's Storage Resource Broker data grid software. To minimize the labor needed to maintain the preservation environment, we're working on an upgrade of the system to the new open-source Integrated Rule-Oriented Data System (iRODS)," said Reagan Moore, Distinguished Scientist and director of SDSC's Data Intensive Computing Environments (DICE) Division. "This will allow more complex and automated data management procedures which are required as the size and diversity of digital data collections continue their rapid growth."

The TPAP testbed, which already holds almost four terabytes (a terabyte is equivalent to 30,000 Encyclopedia Britannicas) of federal government records in more than five million files, gains its archiving power from the "data virtualization" supported by the SDSC Storage Resource Broker technology. This data grid manages the properties of shared electronic records collections distributed across multiple storage systems. The SRB also supports federation of the six independently administered sites, enabling the unification of the records so that they appear to users as a single virtual repository.

This unified virtual environment enables archival staff to easily and flexibly add, manage, access, and replicate data from one site to another, ensuring flexible sharing and reliable access even if data is lost at one or more sites.

The system also allows archivists to verify the authenticity and integrity of replicated data, which is essential for reliable long-term archiving. In another key demonstration, the prototype has been used to manage the evolution of storage technologies by migrating digital data to new hardware and software.

The Transcontinental Persistent Archives Prototype is the product of an eight-year research effort that includes the contributions of NARA's Electronic Records Archives Program, the National Science Foundation's Office of Cyberinfrastructure, SDSC, the University of Maryland, and Georgia Tech.

Related links:

TPAP Internet2 Award <http://www.internet2.edu/idea/2006/transcontinental_persistent_archives_prototypes.html>.

TPAP information <http://www.slac.stanford.edu/history/projects.shtml>.

National Archives and Records Administration (NARA) <http://www.archives.gov/>.

University of Maryland Institute for Advanced Computer Studies (UMIACS) <http://www.umiacs.umd.edu/>.

Georgia Tech <http://www.gatech.edu/>.

West Virginia University <http://www.wvu.edu/>.

San Diego Supercomputer Center (SDSC) <http://www.sdsc.edu/>.

SDSC Storage Resource Broker (SRB) <http://www.sdsc.edu/srb/index.php>.


OAI-ORE Tackles Problem of Compound Information Objects on the Web

Contributed by:
Michael L. Nelson
Department of Computer Science
Old Dominion University
Norfolk, Virginia, USA
<mln@cs.odu.edu>

The Open Archives Initiative Object Reuse and Exchange (OAI-ORE) project is the latest interoperability project of the Open Archives Initiative. Whereas the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) focused on a simple description and interchange format for repositories, OAI-ORE plans to do the same for compound objects on the web. Examples of compound information objects could include a scholarly eprint with multiple formats, versions, and data types, or a blog entry with comments, or an uploaded video with a corresponding description page.

The primary goal of ORE is to facilitate use and reuse of compound information objects (and their component parts) by enriching the web graph with boundary information. As humans, we intuitively recognize compound information objects on the web when we see them, but this distinction is not readily available to web crawlers and other automated applications. ORE will provide an unambiguous, extensible method for enumerating, and describing the relationships between, the web resources that comprise a compound object.

The OAI-ORE technical committee recently released a white paper, "Compound Information Objects: The OAI-ORE Perspective," on their website. This document provides a detailed discussion of the problems that ORE is addressing and introduces possible solution paths, the specifics of which are still under discussion. We welcome feedback about the problem statement, proposed solutions, or other relevant work in this area.

For More Information, including the white paper, please see <http://www.openarchives.org/ore/>.


In the News

Excerpts from Recent Press Releases and Announcements

Digital Preservation Coalition (DPC) appoints new Executive Director

July 12, 2007 - "The DPC is pleased to announce that Ms Frances Boyle has been appointed to the post of DPC Executive Director. Frances will take up the post on the 3rd of September 2007."

"Frances is an information professional with many years experience in the delivery and support of library and information services across a wide range of academic, research and commercial environments. She has worked for the Oxford University Library Services since 2001 where she is currently the IT Development Manager & Strategy Co-ordinator. Frances has been involved in numerous digital preservation initiatives and projects including; SHERPA, Preserv, & Preserv2 and the Oxford Mass Digitisation project. Frances holds a BSc (Hons) in Chemistry and a Masters in Information Studies from the University of London."

For more information, please see <http://www.dpconline.org/graphics/>.


ALA sends three resolutions to Congress in support of GPO, NLS, NDIIPP

July 11, 2007 - "Today, the American Library Association (ALA) reaffirmed its support of three vital government services to the United States Congress. In letters sent to all Members of the U.S. Senate and House of Representatives, ALA included resolutions in support of:

  • The Government Printing Office (GPO), a vital source for government information;
  • The National Library Service (NLS), which provides Talking Books services to the visually impaired; and
  • The National Digital Information Infrastructure and Preservation Program (NDIIPP), the goal of which is to develop a national strategy to collect, archive, catalog and preserve the rapidly increasing amount of digital content for current and future generations."

"The resolutions were passed during ALA's Annual Conference in June by its governing body."

For more information, please see <http://www.ala.org/Template.cfm?Section=news
&template=/ContentManagement/ContentDisplay.cfm&ContentID=161515
>.


OCLC to work with Zepheira to redesign OCLC's PURL service

July 11, 2007 - "OCLC Online Computer Library Center, Inc. and Zepheira, LLC announced today that they will work together to rearchitect OCLC's Persistent URL (PURL) service to more effectively support the management of a 'Web of data.'"

"The software developed will be released under an Open Source Software license allowing PURLs and the PURL infrastructure to be used in various applications for public or proprietary use. OCLC and Zepheira are collaborating to extend the open and inclusive community of PURL users."

"...Zepheira will redesign and build the new PURL service during 2007 to support greater flexibility, new features and the scalability to face an increased demand for PURLs. The new service, which upgrades the existing services at purl.org, will also be hosted by OCLC."

For more information, please see the full press release at <http://www.oclc.org/news/releases/200669.htm>.


Mellon Foundation Awards CLIR Grant to Examine Scholarly Utility of Large-Scale Digitization Projects

July 10, 2007 - "The Andrew W. Mellon Foundation has awarded The Council on Library and Information Resources (CLIR) a grant to assess the utility to scholars of several large-scale digitization projects. CLIR will conduct the project in partnership with Georgetown University."

"Large-scale book-scanning projects, such as Google's and Microsoft's, are making vast collections of works easily accessible in a form that can be queried, interpreted, and reconstituted as new knowledge. These resources are a potential boon to scholars, enabling research that was previously not possible. But are these databases being organized and built to best support the methodologies and intellectual strategies of contemporary scholarship?"

"CLIR's project will focus on Google Book Search, Microsoft's Live Book Search, Project Gutenberg, Perseus, the ACLS Humanities E-Book project, and, possibly, the Open Content Alliance as the main sources for analysis of digitized content. CLIR will ask scholars from historical and literary areas of study to summarize key methodological considerations in conducting research in their disciplines. Scholars will then assess each mass digitization project under scrutiny, and each will submit a report. The reports will be synthesized and recommendations drawn from them. The summary will serve as the basis of a larger meeting of scholars in November 2007 to discuss the findings and recommendations and to determine next steps. Chief among these will be a strategy for working with individual and corporate database developers to improve the utility of these databases to scholars. CLIR will issue a public report early in 2008."

For more information, please see the full press release at <http://www.clir.org/news/pressrelease/07mellon2pr.html>.


NISO Launches Content and Collection Management Committee

July 2, 2007 - "As part of a strategic redesign of its standards-development process, the National Information Standards Organization (NISO) has inaugurated a Content and Collection Management Topic Committee to address issues regarding developing, describing, providing access to, and maintaining content items and collections. Specific areas of coverage include Dublin Core, library binding, storage area networks, and radio frequency identification (RFID) technology. "

"'The CCM committee is responsible for some of the core standards, protocols, and identifiers in our industry – ranging from ISSNs to Dublin Core to the strength of steel shelving,' said Ted Koppel, Committee Chair and Verde Product Manager for Ex Libris. 'The library world looks to standards as the basis for bibliographic control and exchange, as well in many other functional areas. Our role is to analyze the present and help guide the future by ensuring the relevance and currency of appropriate standards.'"

"In addition to tracking national and international standards development related to its topic area, the Topic Committee will identify additional work to fill gaps in the content and collection development standards landscape. As part of this effort, the Topic Committee will convene Thought Leader meetings on specific areas for development, the initial round of which is supported by a grant from The Andrew W. Mellon Foundation."

For more information, please see the full press release at <http://www.niso.org/news/releases/pr-CMM-7-07.html>.


NISO's New Discover to Delivery Topic Committee Aims for Interoperable User Environments

July 2, 2007 - "As part of a strategic redesign of its standards-development process, the National Information Standards Organization (NISO) has inaugurated a Discovery to Delivery Topic Committee to address issues regarding the finding and distribution of information by and to users, including OpenURL, Metasearch, interface design, and web services."

"'The Discovery to Delivery Topic Committee will deal with the outward face of the work done within the information community,' said Mike Teets, Committee Chair and Vice President, OCLC Global Product Architecture. 'With the growing demands of a broadly information-literate world, the focus of standards shifts from more simple vertical standards to broadly interoperable environments. Our standards must address the needs of interoperating within, but possibly more importantly outside of, our traditional communities. The committee's primary role will be analyzing needs and setting a path toward a standards-supported, interoperable user environment.'"

"In addition to tracking standards development nationally and internationally that is related to its topic areas, the new Discovery to Delivery Topic Committee will identify what additional work would fill gaps in the standards landscape. As part of this effort, the Committee will convene Thought Leader meetings, the initial round of which are supported by a grant from The Andrew W. Mellon Foundation. "

The press release is online at <http://www.niso.org/news/releases/pr-D2D-7-07.html>.


NISO Issues Fast-Tracked SERU Draft Document on Shared E-Resource Understanding in Trial Use Through December 20

July 2, 2007 - "Only nine months after the Shared E-Resource Understanding (SERU) Working Group was first formed, the National Information Standards Organization (NISO) has issued a Draft for Trial Use of 'SERU: A Shared Electronic Resource Understanding' (SERU version 0.9). The SERU trial period runs from June 20, 2007 through December 20, 2007; the draft is available from: <http://www.niso.org/committees/seru/>."

"Where applicable for libraries and publishers SERU could save the time required by formal license agreements that can be burdensome and costly. The document consists of a framework and set of statements that express frequently adopted expectations among academic and other non-profit libraries and scholarly publishers. Libraries and publishers using SERU should reference or link to these common understandings."

"To facilitate trial uses of the statement, SERU 0.9 includes guidelines for implementation and the Working Group's website includes new accompanying FAQs to assist users of the statements. A registry of libraries, publishers, and other content providers who wish to announce their interest in using SERU for transactions during the six-month pilot is also available. To join the registry or see the list of current trial participants visit: <http://www.niso.org/committees/seru/registry.html>."

For more information, please see the full press release at <http://www.niso.org/news/releases/pr-SERU-6-07.html>.


Loriene Roy inaugurated 2007 ALA president

June 27, 2007 - "Loriene Roy, professor at the University of Texas at Austin's School of Information, begins her term as 2007-2008 president of the American Library Association (ALA) on June 28, 2007."

"As ALA president, Roy will be the chief elected officer of the oldest and largest library organization in the world. Established in 1876, the American Library Association has more than 64,000 members. Its mission is to provide leadership for the development, promotion, and improvement of library and information services and the profession of librarianship in order to enhance learning and ensure access to information for all."

For more information, please see <http://www.ala.org/Template.cfm?Section=News&template=
/ContentManagement/ContentDisplay.cfm&ContentID=160503
>.


Global Science Gateway Now Open

WorldWideScience.org opens public access to more than 200 million pages of international research information

June 22, 2007 - "The U.S. Department of Energy (DOE) and the British Library, along with eight other participating countries, today opened an online global gateway to science information from 15 national portals. The gateway, WorldWideScience.org, gives citizens, researchers and anyone interested in science the capability to search science portals not easily accessible through popular search technology such as that deployed by Google, Yahoo! and many other commercial search engines."

"...Relying on a novel technology called federated search, WorldWideScience.org gives science information consumers a single entry point for searching far-reaching science portals in parallel, with only one query, saving time and effort. As WorldWideScience.org grows, it will capitalize on existing technology to search vast collections of science information distributed across the globe, enabling much-needed access to smaller, less well-known sources of highly valuable science. Following the model of Science.gov, the U.S. interagency science portal that relies on content published by each participating U.S. agency, WorldWideScience.org will rely on scientific resources published by each participating nation."

"The U.S. contribution to WorldWideScience.org is Science.gov, the U.S. government's one-stop searchable portal to major science databases of federal science agencies. In addition to the U.S. and the U.K., the inaugural WorldWideScience.org portal provides access to research information in English from Australia, Brazil, Canada, Denmark, France, Germany, Japan and the Netherlands. The intent is for WorldWideScience.org to become a world-class Web facility that lets any scientist, any citizen, anywhere, easily find the research results of any nation in any language."

For more information, please see <http://www.doe.gov/news/5153.htm>.


The 2007 Conservation Awards - Digital Preservation Award Short List

Announced June 15, 2007, by Carole Jackson, Digital Preservation Coalition -"The Digital Preservation Award of £5,000 is sponsored by the Digital Preservation Coalition. This prestigious Award recognises achievement and encourages innovation in the new and challenging field of digital preservation – simply put, preserving things whose very existence depends on computers. Short-listed for the Digital Preservation Award are:"

1. LIFE: The British Library.
"LIFE (Lifecycle Information for E-Literature) has made a major contribution to understanding the long-term costs of digital preservation, an essential step in helping institutions plan for the future. Its methodology models the digital lifecycle and calculates the costs of preserving digital information for the next 5, 10 or 100 years. Organisations can apply this process to understand costs and focus resources on those items or collections most in need of them."

2. Web Curator Tool software development project: National Library of New Zealand & The British Library.
"The web is a huge and interconnected digital asset with which we are all familiar, and one in which material changes and disappears with frightening regularity. Conscious of this problem, the National Library of New Zealand and The British Library worked together in an international collaboration to build this tool, which supports selective and thematic web-harvesting by collaborating users in a library environment. Swift development over just 10 months enabled it to be released as free software for the benefit of the international web-archiving community in September 2006, from <http://webcurator.sourceforge.net/>."

3. Active Preservation at The National Archives - PRONOM Technical Registry and DROID file format identification tool: The National Archives of the UK.
"One of the fundamental challenges of digital preservation is to understand the technologies required to access digital information, and plan the actions we will need to take to ensure continued access in the future in the face of constant technological change. Is the software needed to read this document still supported by the supplier, and is the format of this digital movie still readable by most computers? PRONOM is a unique and innovative online service which helps to answer questions like these and includes a knowledge base of technical information about over 600 file formats and 250 software tools, which has been developed by The National Archives to answer these challenges."

4. PARADIGM (The Personal Archives Accessible in Digital Media): Bodleian Library, University of Oxford, & John Rylands University Library, University of Manchester.
Personal archives are important components of cultural memory, but inexperience in curating their modern counterparts – e-mail, digital photographs, online calendars, blogs and many more – puts the survival of today's personal histories at risk. The diversity and volatility of digital technology far exceeds that of any medium that creators, archivists and researchers have previously worked with. The Paradigm project has worked with politicians, archivists and researchers to investigate these challenges in an exemplar project so that the archives of significant contemporaries can continue to enrich our history."

5. Digital Repository Audit and Certification: CRL, RLG-OCLC, NARA, the DCC, DPE and Nestor.
"As the number of organisations, both public and private, preserving digital information increases, it becomes important to be able to assess how well they are doing and how well-prepared they are for the unknown challenges of the future. The Trustworthy Repositories Audit and Certification (TRAC) Criteria and Checklist (maintained by the US Center for Research Libraries), the nestor project's Criteria Catalogue and the Digital Repository Audit Method Based on Risk Assessment (DRAMBORA) published by the Digital Curation Centre and DigitalPreservationEurope present complementary methods for the self assessment, audit and certification of digital repository infrastructures."

"All the short-listed projects will give a presentation to the Digital Preservation Award judges on 19 June. The winners of the Conservation Awards 2007 will be announced at the British Museum on 27 September."


Supporting research in the Arts and Humanities: JISC reviews its services

June 13, 2007 - "Following the decision by the AHRC (Arts and Humanities Research Council) to cease funding the AHDS2 (Arts and Humanities Data Service) from March 31st 2008, JISC has decided that it is unable to fund the service alone and that therefore its own funding of the service will, in its current form, cease on the same date."

"In its 11 years of existence the AHDS has established itself as a centre of expertise and excellence in the creation, curation and preservation of digital resources and has been responsible for a considerable engagement of the Arts and Humanities community with ICT and a significant increase in that community's knowledge and use of digital resources. Its contribution to the development of technical standards, its outreach to sectors beyond higher education, such as cultural heritage, arts, museum and archive organisations and its support for the development of a national e-infrastructure and repository system have been among its many significant achievements. "

"In the light of these achievements and the consequent risks to the continued development of the Arts ands Humanities community's engagement with ICT, JISC is exploring with the AHDS, partner organisations and the wider community alternative approaches to maintaining its strong support for that community beyond March 2008."

For more information, please see the full press release at <http://www.jisc.ac.uk/news/stories/2007/06/news_ahds.aspx>.


Laura Campbell Recognized as Laureate by Computerworld Honors Program

EMC Information Leadership Award Honors Campbell's Role in Leading Library of Congress's Digital Programs

June 6, 2007 - "Laura E. Campbell, associate librarian for Strategic Initiatives and chief information officer for the Library of Congress, today received the prestigious 2007 EMC Information Leadership Award from the Computerworld Honors Program. The award was presented during the 19th Annual Laureates Medal Ceremony & Gala Awards Evening at the Andrew W. Mellon Auditorium in Washington. For almost two decades, Computerworld Honors has acknowledged those individuals and organizations that have used information technology to benefit society."

"'I am honored to receive the EMC Information Leadership Award, and I feel privileged to lead such innovative and important work for the largest and most accessible library in the world,' said Campbell. 'Hundreds of Library staff as well as our external partners have collaborated on our digital programs, and they have played a key role in their success.'"

"Campbell, who is also the Library's chief information officer, leads the Library's National Digital Information Infrastructure and Preservation Program (http://www.digitalpreservation.gov), which is leading the nation in the collection and preservation of important born-digital content that would otherwise be lost. She also leads the National Digital Library Program, which offers more than 22 million items on the Library's various award-winning Web sites at <http://www.loc.gov>."

For more information, please see the full press release at <http://www.loc.gov/today/pr/2007/07-127.html>.


US Department of Energy Gray Literature Now Registered with CrossRef

June 4, 2007 - "The U.S. Department of Energy and CrossRef are very pleased to announce that the Office of Scientific and Technical Information (OSTI) has completed DOI registrations for more than 86,000 DOE technical reports with CrossRef, the earliest report dating from 1933."

"According to Ed Pentz, CrossRef's Executive Director, "CrossRef's discussions with OSTI about depositing technical reports began several years ago, and we are really thrilled that the mission is now accomplished. It marks a milestone in CrossRef's endeavor to establish a truly robust citation network, one that spans decades of research, different content types, and all disciplines.'"

"Dr. Walter Warnick, Director of OSTI, commented, "Coupling the vast resources available on OSTI's Information Bridge with the capabilities of CrossRef speeds access to DOE scientific and technical literature, placing it on the same footing as journal articles in terms of reference linking.'"

For more information, please see <http://www.crossref.org/01company/09press_releases.html>.


Copyright 2007 © Corporation for National Research Initiatives

Top | Contents
Search | Author Index | Title Index | Back Issues
Previous Article | Clips & Pointers
E-mail the Editor


doi:10.1045/july2007-inbrief