Clips & Pointers


D-Lib Magazine
September/October 2009

Volume 15 Number 9/10

ISSN 1082-9873

In Brief


File Information Tool Set (FITS): A New Tool for Digital Preservation Repositories

Contributed by:
Spencer McEwen, Digital Library Software Engineer, Chief Developer of FITS
Andrea Goethals, Digital Preservation and Repository Services Manager
Harvard University Library
Cambridge, Massachusetts, USA
<{spencer_mcewen, andrea_goethals}>

Today's digital preservation repositories increasingly have to accept many different file formats. After Harvard University Library (HUL) began a web archiving service in 2008, 335 new file formats were introduced to its repository. How can we accurately identify all these different formats and create the technical metadata needed to preserve these files? Manual solutions are not scalable – we need automated tools that can work with a broad range of file formats and genres.

Over the last year HUL created an open-source tool that combines the abilities of many different open-source file identification, validation, and metadata extraction tools. The File Information Tool Set (FITS) acts as a wrapper around these tools, invoking, normalizing, and combining their output.

The first version of FITS wraps the following tools:
  • ExifTool
  • National Library of New Zealand Metadata Extractor
  • FFIdent
  • File Utility

Each of these tools has different strengths that when used together support many different formats. As new tools become available they will be evaluated for potential integration into FITS. The candidate tools of particular interest include:

  • Apache Tika
  • JHOVE2
  • Aduna Aperture
  • MediaInfo

HUL plans to use FITS in production in its repository ingest service in 2010. An early release of FITS is being made available now under the LGPL license on the FITS website: <> to the community for testing and consideration for future integration into production repository work flows. We invite you to download and try using FITS. Any problems can be reported on the Issues tab of the FITS website. For more information please see the documentation on the FITS website, or email us directly.

Meet RODA, an Open-source Repository for Digital Preservation

Contributed by:
Luis Faria & Miguel Ferreira
Portuguese National Archives
Lisbon, Portugal

In mid 2006, the Portuguese National Archives1 launched a project called RODA – Repository of Authentic Digital Objects – aiming at identifying and bringing together all the necessary technology, human resources and political support to carry out long-term preservation of digital materials.

As part of the original goals of the project, RODA was the development of a digital repository capable of ingesting, managing and providing continuous access to various types of digital objects produced by national public institutions.

The end result was a powerful repository able to ingest and preserve text documents, images, video, audio and relational databases in many different formats. RODA automatically normalizes ingested data to preservation formats; implements a powerful and extensible preservation event scheduling mechanism; includes various data viewers for all supported representations; offers advanced user management and control; provides statistics; and much more.

Additionally, the repository is supported by open-source technologies (Fedora Commons, JBoss, Web services, etc.) and is based on existing standards such as OAIS, EAD, METS and PREMIS. To experience RODA, please visit the demo site ( where you can:

  • Login as consumer to browse and search documents;
  • Login as producer to send new documents to be ingested;
  • Login as archivist to check documents conformity, accept/reject submitted documents, create and manage collections, change object permissions and much more;
  • Login as administrator to gain access to user management, task management, global statistics and reports, search logs, etc.

Furthermore, you may access the development site ( where you can:

  • Participate in discussion forums and report issues
  • Read and download technical documentation, articles and presentations
  • Download sources and installation packages

Note: Registration on the development site is mandatory before performing any of the outlined tasks.

1. Directorate-General of the Portuguese Archives

The New Zealand Public Sector Digital Continuity Action Plan

Contributed by:
Stephen Clarke
Senior Advisor, Digital Continuity
Archives New Zealand
Wellington, New Zealand

New Zealand's Digital Continuity Action Plan ( is a world first initiative to prevent important public records being lost and ensure that today's information is available tomorrow. Globally, so far as we know, this is the first nation-wide, whole of public sector approach to the issues of ongoing trusted access to digital information assets for business continuity and service delivery.

Most public sector information is created digitally, but the availability of information over time has become a real concern with software, hardware and media obsolescence issues accumulating. In the New Zealand public sector, 74% of state sector organisations report that they are already experiencing problems with records held in formats that mean they can no longer access them (see Government Recordkeeping Survey 2009. This includes electronic documents saved without appropriate titles or other metadata (58% of agencies), records that require software or hardware that is no longer available (25%), and records on obsolete storage media (24%). These figures are increasing each year. To address this concern the plan has been developed as a collaborative programme to assist and support agencies to overcome issues in storing, accessing, using and reusing the digital information they produce.

Archives New Zealand will lead the implementation of the plan, with our key strategic allies (including the Government Technology Services, State Services Commission, National Library and others), to provide support, advice and leadership on digital continuity issues. The plan is designed to support our legislative mandate to improve public sector information management. It provides a platform for public sector agencies to act in a coordinated way to manage their digital information efficiently. The aim of the plan is to ensure that public sector resources are used more efficiently by minimising duplication and sharing ideas, expertise, and systems.

The objective of the action plan is to ensure that public sector digital information is trusted and accessible when it is needed now and in the future. The key messages are:

  • There when you need it. Public sector digital information will be maintained so that it can be accessed when it is needed. Some information may be required for many decades.
  • Authentic and reliable. Public sector digital information is trustworthy, tamper-proof and free of technological digital rights restrictions.
  • Trusted access. Citizens can be confident that their information is accessible for their needs but protected from unauthorised access.
  • Do nothing, lose everything. If no action is taken, public sector digital information will be lost. A proactive approach is necessary.

The plan commits Archives New Zealand to undertake a series of measurable and achievable projects to achieve the six high-level goals within the plan. This is a three year programme and although there are dependencies the goals and actions will be undertaken concurrently. The goals are:

  1. Understanding: Those responsible for public sector digital continuity communicate effectively with each other and have a common understanding of the issues.
  2. Well-managed from Day One: All public sector digital information is well-managed from the point of creation onwards.
  3. Infrastructure: Robust cross-agency infrastructure exists to support the interoperability of systems.
  4. High-value information kept: High-value information is identified, so that business critical information is not lost in the digital landfill.
  5. Trusted access: Citizens are able to access digital information now and in the future, and information is protected from unauthorised access and use.
  6. Establish good governance: Information management across the public sector is characterised by good governance, leadership and accountability.

In the News

Excerpts from Recent Press Releases and Announcements

A Compact for Open-Access Publication

Cornell, Dartmouth, Harvard, Massachusetts Institute of Technology, and the University of California, Berkeley Announce Joint Support for Open-Access Publication; Additional Research Institutions Invited to Join the Five-Member Compact

September 14, 2009 - "Five of the nation's premier institutions of higher learning – Cornell, Dartmouth, Harvard, the Massachusetts Institute of Technology, and the University of California, Berkeley – today announced their joint commitment to a compact for open-access publication."

"Open-access scholarly journals have arisen as an alternative to traditional publications that are founded on subscription and/or licensing fees. Open-access journals make their articles available freely to anyone, while providing the same services common to all scholarly journals, such as management of the peer-review process, filtering, production, and distribution."

"The economic downturn underscores the significance of open-access publications. With library resources strained by budget cuts, subscription and licensing fees for journals have come under increasing scrutiny, and alternative means for providing access to vital intellectual content are identified. Open-access journals provide a natural alternative."

For more information, please see the full press release at <>.

CrossRef Collaborates with SAGE, OUP, CLOCKSS and Portico to Light Up Archive for Discontinued Journal Articles

September 11, 2009 - "CrossRef has collaborated with archiving organizations and publishers to ensure that several journals that have ceased publication remain linkable with the CrossRef DOIs (Digital Object Identifiers) originally assigned to the articles. The titles include Auto/Biography and Graft from SAGE and Brief Treatment and Crisis Intervention from Oxford University Press (OUP). All three titles are now available through both CLOCKSS and Portico."

"An archive 'trigger event' occurs when a published journal or other content is no longer available from the publisher. Trigger events can occur for a variety of reasons. Both SAGE and OUP have had agreements in place with archive organizations for several years, but the discontinuation of these titles marked the first time those arrangements had been implemented with real-world cases."

"'Two important tenets of CrossRef's mission are persistence and cooperation,' said Ed Pentz, Executive Director of CrossRef. 'Making sure that the CrossRef DOIs that have been assigned to content that has moved from a publisher journal platform to an archive still resolve to the articles is an important part of that persistence. Persistence is not only achieved through technology but by cooperation: CrossRef, publishers, journal hosting services, and the archiving organizations have all worked together to ensure continued access to the scholarly record. These journals are particularly strong examples of the system in action as there are multiple archives available to guarantee ongoing access.'"

For more information, please see the full press release at <>.

IMLS Grant Will Help Libraries Help the Unemployed

September 10, 2009 - "Job seekers have packed libraries around the country during recent months, searching online job sites, building resumes, taking interview classes, and making use of a wide range of other employment services and resources. More help is on the way. Through a grant from the Institute of Museum and Library Services (IMLS), WebJunction, the online learning community for library staff created by OCLC, a nonprofit library service and research organization; and the State Library of North Carolina (SLNC) have launched a one-year initiative to gather and share best practices for providing library-based employment services and programs to the unemployed."

"'We know that libraries are making important contributions to the nation's economic recovery, and IMLS is committed to helping those libraries help their communities get back to work,' said Anne-Imelda M. Radice, IMLS Director. 'We admire this grant because of the educational opportunities it will provide and the relationships between libraries and economic and workforce development agencies that it will foster.'"

For more information, please see the full press release at <>.

IMLS Awards Over $2.7 Million to Native American and Native Hawaiian Libraries for Enhancements to Library Services

September 8, 2009 - "The Institute of Museum and Library Services (IMLS) announced today the 17 Native American tribal communities and Alaska Native villages that are this year's recipients of $2,219,312 in Native American Library Services Enhancement grants. The Institute is also pleased to report that Alu Like, Inc. is the recipient of the Native Hawaiian Library Services grant totaling $531,000. Click here to see a full list of Enhancement grant recipients."

"This year, grantees will tackle a wide range of projects, including:

  • The 'We Are All Family' project by the Makah Cultural and Research Center (MCRC), on behalf of the Makah Indian Tribe, will enhance access to family genealogy and ancestral history information for Makah community members.
  • The Every Child Ready to Read (ECRR) program at the Pueblo of Pojoaque Public Library will focus on literacy skills for young children and their parents.
  • The enhancement of the NHL digital libraries to meet the needs and interests of Native Hawaiians, and the continuation of the Motheread workshops, which focus on language and literacy development with parents and their children at Alu Like, Inc. To learn more about Alu Like, Inc.'s proposed activities, click here."

For more information, please see the full press release at <>.

Report Shows Decline in Federal Science and Engineering Funding at Minority-Serving Institutions (NSF Press Release 09-169)

September 8, 2009 - "In fiscal year 2007, federal agencies gave less science and engineering (S&E) funding to academic institutions that primarily serve minority students, says a new National Science Foundation report released today...."

"...On the whole, federal agencies gave less money to all academic institutions in fiscal year 2007 with the overall, inflation-adjusted total decreasing 0.4 percent from fiscal year 2006 levels. Federal research and development obligations to all universities and colleges totaled $25.3 billion in fiscal 2007."

For more information, please see the full press release at <>.

Webcast: From Data Deluge to Useful Knowledge: Data Sharing and Preservation with iRODS for Multi-Agency NITRD Committee

September 8, 2009 announcement from Paul Tooby, Community Development Coordinator, Data Intensive Cyber Environments (DICE) Center: "With the size of digital data collections expected to double in just five years, how will it be possible to organize, share, and extract useful knowledge from this deluge of data? How will it be possible to preserve this digital information for future generations, when it can disappear with the crash of a hard drive, obsolete software applications, or proliferating proprietary formats? "

"Interest in meeting these challenges is high, and a large crowd recently gathered at the National Science Foundation for a 'Technical Demonstration of an Integrated Preservation Infrastructure Prototype,' at the invitation of NITRD, the multi-agency National Science and Technology Council subcommittee on Networking and Information Technology Research and Development."

"The webcast and slides can viewed at <>."

For more information, please contact Paul Tooby at <>.

'It is time for Europe to turn over a new e-leaf on digital books and copyright'. Joint Statement of EU Commissioners Reding and McCreevy on the occasion of this week's Google Books meetings in Brussels

September 7, 2009 - "Viviane Reding, Commissioner for Information Society and Media, and Charlie McCreevy, Commissioner for the Internal Market and Services, today made a joint statement setting out the important cultural and economic stakes of book digitisation in Europe. To face the daunting task of digitising Europe's books, of which there are tens of millions in Europe's national libraries alone, the two Commissioners stressed the need for fully respecting copyright rules to ensure fair remuneration for authors, but also welcomed public-private partnerships as a means to boost digitisation of books. They highlighted the need to adapt Europe's still very fragmented copyright legislation to the digital age, in particular with regard to orphan and out-of-print works."

"'Europe is facing a very important cultural and economic challenge: Only some 1% of the books in Europe's national libraries have been digitised so far, leaving an enormous task ahead of us, but also opening up new cultural and market opportunities. A better understanding of the interests involved will help the Commission to define a truly European solution in the interest of European consumers. We believe that such a European solution should breathe fresh life into this issue and could give every citizen with an internet connection access to millions of books that today lie hidden on dusty shelves. Our aim is to blow away stale stereotypes that hindered debate in the past and focus on finding the best approach that today's technology will allow us to take in the future, while giving a new boost to cultural creation in the digital age.'"

"'Digitisation of books is a task of Herculean proportions which the public sector needs to guide, but where it also needs private-sector support. It is therefore time to recognise that partnerships between public and private bodies can combine the potential of new technologies and private investments with the rich collections of public institutions built up over the centuries. If we are too slow to go digital, Europe's culture could suffer in the future.'"

For more information, please see the press release at <

Public Consultation - National Information Society Policy: A template

(Comments Due by September 20, 2009)

September 6, 2009 - "The Information for All Programme of UNESCO (IFAP) was established by UNESCO to provide a framework for international co-operation and partnerships in 'building an information society for all'. IFAP's focus is on ensuring that all people have access to information they can use to enhance their lives. UNESCO has assumed the task of assisting Member States in the formulation of national information policy frameworks, in particular within the framework of the Information for All Programme (IFAP)."

"The 'Tunis Agenda for the Information Society,' adopted during the second phase of WSIS in Tunis in 2005 included the following paragraph: 'Taking into consideration the leading role of governments in partnership with other stakeholders in implementing the WSIS outcomes, including the Geneva Plan of Action, at the national level, we encourage those governments that have not yet done so to elaborate, as appropriate, comprehensive, forward-looking and sustainable national e-strategies, including ICT strategies and sectoral e-strategies as appropriate, as an integral part of national development plans and poverty reduction strategies, as soon as possible and before 2010.' The Template is intended to assist countries and governments in the process of developing, or extending and updating, such policies and strategies."

"All interested parties are invited to send in comments on this Template, before it is submitted to the Intergovernmental Council of IFAP for approval and then distributed to UNESCO Member States."

Please send your comments before September 20, 2009, to:

Karol Jakubowicz, Chair, IFAP Intergovernmental Council, email: <> or to Mr. Boyan Radoykov, UNESCO Information Society Division, email: <>

For more information, please see <>.

Library associations submit supplemental filing, call for increased oversight of Google agreement

September 2, 2009 - "The American Library Association (ALA), the Association of College and Research Libraries (ACRL) and the Association of Research Libraries (ARL) today submitted a supplemental filing with the U.S. District Court for the Southern District of New York overseeing the proposed Google Book Search settlement to address developments that have occurred since the groups submitted their filing on May 4."

"While the library associations position has not changed since their initial filing, the groups believe that recent activity, such as an amended agreement reached between Google and the University of Michigan, the University of Texas-Austin and the University of Wisconsin-Madison, Googles recent public statement regarding privacy, and the library associations communication with the Antitrust Division of the U.S. Department of Justice (DOJ) should be brought to the courts attention. In their supplemental filing, the library associations call upon the court to address concerns with pricing review, to direct Google to provide more detail on privacy issues, and to broaden representation on the Books Rights Registry."

For more information, please see <>.

Harvard's DASH for Open Access

September 1, 2009 - "Harvard's leadership in open access to scholarship took a significant step forward this week with the public launch of DASH – or Digital Access to Scholarship at Harvard – a University-wide, open-access repository. More than 350 members of the Harvard research community, including over a third of the Faculty of Arts and Sciences, have jointly deposited hundreds of scholarly works in DASH."

"'DASH is meant to promote openness in general,' stated Robert Darnton, Carl H. Pforzheimer University Professor and Director of the University Library. 'It will make the current scholarship of Harvard's faculty freely available everywhere in the world, just as the digitization of the books in Harvard's library will make learning accumulated since 1638 accessible worldwide. Taken together, these and other projects represent a commitment by Harvard to share its intellectual wealth.'"

"Visitors to DASH ( can locate, read, and use some of the most up-to-the minute scholarship that Harvard has to offer."

"...Still a beta, DASH is a joint project of the OSC and the Office for Information Systems (OIS), both of which are strategic programs of the Harvard University Library. DASH is based on the open-source DSpace repository platform. Software customizations will continue throughout the coming academic year."

For more information, please see <>.

New ACRL Image Resources Interest Group

August 31, 2009 announcement from Denise Hattwig, University of Washington Libraries: "We're pleased to announce formation of the new Image Resources Interest Group in the Association of College and Research Libraries."

"The Image Resources Interest Group (IRIG) will provide a forum for ongoing discussion of the unique issues presented by the development and support of interdisciplinary image resources in academic libraries. Discussion topics will include the following:

  • Selecting and using subscription image databases
  • Choosing digital asset management and presentation tools
  • Working with images across systems and platforms
  • Supporting faculty research and teaching with images
  • Developing interdisciplinary image collections
  • Collaborating with academic departments and across library units to support image resource development and use
  • Image cataloging and metadata
  • Effective access to image resources through library web sites
  • Visual literacy
  • Image copyright
  • Digital capture"

"More information about the Image Resources Interest Group is now available through ALA Connect, at <>."

Europe's Digital Library doubles in size but also shows EU's lack of common web copyright solution

August 28, 2009 - "4.6 million digitised books, maps, photographs, film clips and newspapers can now be accessed by internet users on Europeana, Europe's multilingual digital library ( The collection of Europeana has more than doubled since it was launched in November 2008. Today the European Commission, in a policy document declared as its target to bring the number of digitised objects to 10 million by 2010. The Commission also opened a public debate on the future challenges for book digitisation in Europe: the potential of the public and private sector to team up and the need to reform Europe's too fragmented copyright framework."

"Today a user can find 4.6 million digitised objects on Europeana, compared to 2 million nine months ago....However, the substantial progress made with Europeana also brings to the surface the challenges and problems linked to the digitisation process. At the moment, Europeana includes mainly digitised books which are in the public domain and are thus no longer protected by copyright law (which extends to 70 years after the death of the author). For the moment, Europeana includes, for legal reasons, neither out-of print works (some 90% of the books in Europe's national libraries), nor orphan works (estimated at 10 - 20% of in-copyright collections) which are still in copyright but where the author cannot be identified."

"To address all these issues, the Commission launched today a public consultation on the future of Europeana and the digitisation of books that will run until 15 November 2009. Questions the Commission asks include: How can it be ensured that digitised material can be made available to consumers EU-wide? Should there be better cooperation with publishers with regard to in-copyright material? Would it be a good idea to create European registries for orphan and out-of print works? How should Europeana be financed in the long term?"

For more information, please see the full press release at <

Peer Reviewers Needed for Broadband Technology Opportunities Program

August 27, 2009 - "The Institute of Museum and Library Services, in coordination with the U.S. Department of Commerce's National Telecommunications and Information Administration (NTIA), encourages interested library and museum professionals to review grant applications for the Broadband Technology Opportunities Program (BTOP). This $4.5 billion broadband grant program is funded by the American Recovery and Reinvestment Act (ARRA), which seeks to bring universal broadband access to all Americans while creating jobs and stimulating the economy."

"Reviewers will participate remotely and will not be required to attend any in-person meetings. A 90-minute training session/webinar will be provided for reviewers and repeated several times to allow for scheduling flexibility. Participants will review no more than 10 applications, each requiring 60 to 90 minutes, and participate in a 2-hour wrap-up conference call."

For more information, please see the full press release at <>.

ACRL announces fall 2009 e-Learning schedule

August 18, 2009 - "The Association of College and Research Libraries (ACRL) is offering a wide variety of online learning opportunities in fall 2009 to meet the demands of your schedule and budget. Full details and registration information is available on the ACRL Web site at <>."

"Registration for all online seminars and Webcasts qualifies for the new Frequent Learner Program. Register for three ACRL e-Learning events and receive one free registration. Visit <> for more information on the Frequent Learner Program."

"ACRL online seminars are asynchronous, multi-week courses delivered through Moodle."

For more information, please see <>.

Talis launches angel fund for Open Education

August 10, 2009 - "The Talis Education Division has announced an angel fund to help Open Education projects – the Talis Incubator for Open Education."

"The Talis Incubator for Open Education provides funding of up to £15,000 to help individuals or small groups who have big ideas about furthering the cause of Open Education. All Talis asks in return is that the project deliverables are 'open sourced' and the intellectual property returned back to the community, allowing it to be used freely. Talis won't, and never will, exert any rights to the intellectual property or ideas that are funded."

"The scheme runs for 12 months, and awards will be made to successful applicants in two rounds during the year. Talis are currently pulling together a review board of people from the Open Education community to review submitted project proposals, ensuring the community has a significant voice in choosing successful applicants."

For more information, please see the full press release at <>.

NISO Two-Part September Webinar: E-Resources Licensing: The Good, The Bad, The Ugly

August 10, 2009 announcement from Cynthia Hodgson, NISO: "Not many librarians are also lawyers, but they often need to have an understanding of legal issues to succeed in their jobs. Licensing, contract, and copyright law all have significant impacts on our community. NISO's September two-part your solution to the dilemma." [Editor's note, see below for alternative to attending in person as the dates of the seminar have passed.]

"Part I of the webinar will provide an introduction to the basics of a license agreement as a legal contract....Part II of the webinar will review key terms in an agreement as highlighted in a sample license."

"Can't make it on the webinar date? Register and gain access to the recorded archive for one year. For more information and to register, visit the event webpages:

New Open Access Series from Open Humanities Press and U-Mich Library's Scholarly Publishing Office

August 7, 2009 - "Open Humanities Press (OHP), in conjunction with the University of Michigan Library's Scholarly Publishing Office (SPO), is pleased to announce the following forthcoming open access series in critical and cultural theory: New Metaphysics (ed. Graham Harman and Bruno Latour), Critical Climate Change (ed. Tom Cohen and Claire Colebrook), Global Conversations (ed. Ngugi wa Thiong'o), Unidentified Theoretical Objects (ed. Wlad Godzich), and Liquid Books (ed. Clare Birchall and Gary Hall)."

"In a unique collaboration, the scholars of the Open Humanities Press are partnering with the University of Michigan Library's Scholarly Publishing Office to launch five new OA book series, edited by senior members of OHP's editorial board. All of the books will be freely available in full-text, digital editions and as reasonably-priced paperbacks...."

"...After the vetting and peer review process, manuscripts will be handed on to SPO for conversion to structured XML for electronic and print on demand publication, metadata creation and cataloging, and archiving in the University of Michigan Library for long-term preservation. The books will be available electronically through the OHP and SPO websites, and in paperback through the usual online distributors...."

"...Authors will retain the copyrights for their works and have a choice of Creative Commons licenses. They will also have the option of making their manuscripts available online in various pre- and post-publication versions for reader commenting and annotation if they so wish."

For more information, please contact Sigi Jöttkandt ( or Shana Kimball (

CrossRef Hits the Books - Deposits Grow, Guidelines Released

July 29, 2009 - "For the second year in a row CrossRef deposits for books are growing faster than any other content type in the reference linking system. As of July 2009, more than 1.8 million CrossRef Digital Object Identifiers (DOIs) have been assigned for books. Each CrossRef DOI represents a citable book title, chapter, or reference entry that can be used to link references from scholarly content. Book deposits range from monographs with a single CrossRef DOI to massive reference works with tens of thousands of individual entries."

"To encourage publishers to ramp up reference linking for scholarly books, and to explain how CrossRef DOIs for books work, CrossRef has published two documents. The first, Best Practices for Books, was created by CrossRef's Book Working Group. The second is a Frequently Asked Questions (FAQ) document explaining the relationship between CrossRef DOIs and other DOI applications, such as the ISBN-A."

"Almost 60 publishers have deposited CrossRef DOIs for nearly 84,000 book titles since CrossRef began accepting book deposits."

For more information, please see <>.

Copyright 2009 © Corporation for National Research Initiatives

Top | Contents
Search | Author Index | Title Index | Back Issues
Second Conference Report | Clips & Pointers
E-mail the Editor