D-Lib Magazine
The Magazine of Digital Library Research

I N   B R I E F

March/April 2014
Table of Contents


Effective and Efficient Multilingual Information Access to Digital Collections

Contributed by:
Jiangping Chen
Associate Professor
University of North Texas
Denton, Texas, USA
Jiangping.chen [at] unt.edu

The University of North Texas is collaborating with Shenzhen Library in China and the Autonomous University of the State of Mexico in Mexico on a research project funded by the Institute of Library and Museum Services' (IMLS) National Leadership Grant. This project, called MLIA4DC in short, explores effective and efficient application of Machine Translation (MT) technologies for providing multilingual information access (MLIA) services to digital collections. It is a continuation of the IMLS-funded MRT Project, which evaluated the performance of MT on metadata records.

The MLIA4DC Project addresses the needs of three main audiences: (1) non-English speakers who wish to query English language digital collections; (2) digital collection developers interested in providing multilingual information access services for their digital collections; and (3) researchers in MT and Cross-language Information Retrieval (CLIR). In the increasingly global knowledge society, libraries and museums desire to design and implement effective and efficient MLIA in order to serve broader user groups and to share information with global societies. However, current multilingual digital collections employ mainly human translation to convert metadata records from one language to another, which is expensive and time-consuming.

The advancement of MT technologies provides an alternative for libraries and museums to implement MLIA for their digital collections. The MLIA4DC Project will apply MT to translate a large number of English metadata records into Chinese and Spanish, and then conduct MLIA experiments and evaluation. The goal is to determine the effectiveness and efficiency of applying MT for MLIA to digital collections. Specifically, the project will:

  • Develop an MLIA service model based on MT for digital collections;
  • Develop an open-source multi-engine MT (MEMT) strategy that can be used to translate other metadata records;
  • Develop a multilingual (English, Chinese, and Spanish) corpus of metadata records for MT and CLIR research;
  • Evaluate the effectiveness and efficiency of the MLIA service model to determine which MT strategy achieves the best MLIA performance.

The following research question will be answered: What is the effectiveness and efficiency of MLIA services applying machine translation to metadata records?

To achieve the research goal and to answer the research question, the research team will apply a document-translation based CLIR strategy - metadata records as documents will be translated into target languages by MT systems prior to search. One million metadata records from the Library of Congress Catalog and UNT Library Catalog will be combined to form a digital collection. Important elements including: Title, Creator, Subject, Description, Publisher, and Coverage will then be translated into Chinese and Spanish by MT systems based on different MT strategies. The translated metadata records will later be indexed by an information retrieval system, enabling native speakers of Chinese or Spanish to search the collection of the metadata records using their native languages. The retrieval results will be evaluated and analyzed to identify effective and efficient solutions to MLIA for the digital collection.

This 36-month project started on November 1, 2013 and will end on October 31, 2016. Dr. Jiangping Chen from the Department of Library and Information Sciences, College of Information is the Principal Investigator. Currently the research team is working on the construction of the digital collection, testing different information retrieval systems, and obtaining MT results from online MT services. If you are interested in learning more about this project, please visit its homepage at: http://txcdk-iia.unt.edu/MLIA/, or you can send an email to Dr. Chen via her email: Jiangping.chen@unt.edu.


Preservation and Access Framework for Digital Art Objects

Contributed by:
Madeleine Casad
Associate Curator, the Rose Goldsen Archive of New Media Art
Curator for Digital Scholarship
Division of Digital Scholarship and Preservation Services
Cornell University Library
Ithaca, New York, USA

Cornell University Library's Division of Digital Preservation Services and the Rose Goldsen Archive of New Media Art are collaborating on an NEH-funded project to develop archival Preservation and Access Frameworks for Digital Art Objects.

Using the Goldsen Archive's test bed of over 300 born-digital interactive artworks, the project aims to create preservation practices and preservation metadata frameworks that may be valuable for other archiving institutions and transferrable to other kinds of complex born-digital collections. The Cornell-based project team is working in consultation with New York collaborators AudioVideo Preservation Solutions (AVPS) and an international Advisory Board.

The Rose Goldsen Archive of New Media Art is a research archive located in the Rare and Manuscript Collections division of Cornell University Library. Founded in 2002, the Goldsen maintains a strong curatorial focus on interactive born-digital artworks created for CD-ROM, DVD-ROM, and web delivery. Such artworks reflect early aesthetic experimentation with the experiential and communicative possibilities of interactive digital media. The Goldsen Archive's mission is to preserve this cultural history and make it readily available to a broad range of researchers.

Complex born-digital artifacts constitute a growing part of our cultural heritage as a digital society, yet these works present serious preservation challenges and obsolescence risks. More than 70% of the digital artworks in the Goldsen Archive can only be accessed with legacy computer terminals that run obsolete operating systems and software. Only recently have archiving institutions begun to develop scalable best practices for preserving materials like these.

Preservation and Access Frameworks for Digital Art Objects is one such initiative. This project is now beginning its second year. Some milestones and questions addressed so far by project staff and partners include:
  • Creating a workflow for basic disk imaging
  • Establishing baseline preservation practices that encompass physical media, digital content, and accompanying print or visual materials
  • Drafting an archival repository architecture commensurable to the complexity of the digital materials
  • Identifying "classes" across the test collection, based on criteria such as software dependencies, operating systems, types of rendering or imaging errors, "significant properties" essential to a user's experience of an artwork, or emulation viability
  • Canvassing a community of media art researchers and archivists in order to learn more about their needs, preferences, and practices

The project aims to create a preservation framework that will account for rapidly evolving access strategies, such as emulation. We also aim to develop an informed model for understanding what ideal and "best feasible" access versions might best address archive users' current and future research needs. Toward that end, we created a questionnaire to elicit the input of media art researchers, a disparate community that represents a wide range of professions, disciplines, and potential use cases. The project team is currently analyzing a robust and informative first round of responses, and will soon open the questionnaire to a second round.

Members of the project team will give presentations at the New England Archivists' Spring 2014 conference, and at the American Institute for Conservation of Historic and Artistic Works 2014 annual meeting.

For more information, please see:

The original press release: http://news.cornell.edu/stories/2013/02/humanities-grant-helps-library-preserve-digital-art

The project website: https://confluence.cornell.edu/display/pafdao/Home

The Rose Goldsen Archive of New Media Art website: http://goldsen.library.cornell.edu/

Or contact the project manager: mir9@cornell.edu


Staff Learning in Times of Rapid Change

Contributed by:
Betha Gutsche
Programs Manager
Seattle, Washington, USA
gutscheb [at] oclc.org

As public libraries evolve to meet the changing needs of their communities, library staff are challenged to continually update and expand their skills and knowledge. Library-focused continuing education (CE) providers are equally challenged to find effective and sustainable models for delivering training opportunities that meet ever-changing demands for continual learning. Funded by a grant from the Institute of Museum and Library Services (IMLS), the Strengthening Continuing Education Content for Libraries project seeks solutions and strategies to keep staff skills sharp so that they can help sustain thriving libraries.

Surveys of CE coordinators and trainers have indicated that, although they still value face-to-face training highly, they see an increasing reliance on elearning for its cost-saving and flexible access. As partners in the grant, WebJunction and Infopeople both have a solid history of providing elearning opportunities to staff in public libraries of all sizes, through online resources, webinars, and facilitated online courses. This project will focus on piloting models for the rapid creation of library-focused, self-paced elearning, designed to be broadly applicable and readily shareable across platforms.

Project activities include:

  • Updating the Competency Index for the Library Field, compiled by WebJunction. Originally published in 2009, the Index has been revisited and revised, informed by review of recent competency sets from various library organizations and by input from subject matter experts across the field. Three elements in particular have been emphasized throughout: 21st century skills, accountability, and community engagement.
  • Offering a Library CE Training Institute. Twelve CE providers have been selected to receive training on best practices for designing and delivering online learning. Participants will learn together and work in teams to create new elearning modules on high priority topics for building library staff skills. The elearning modules will be shared openly throughout the library field by WebJunction.

The grant period began October 2013 and runs through September 2014. Watch the project page for updates and ongoing learning. Or sign up for the Crossroads monthly newsletter to receive news about this and other WebJunction programs.


Wellcome Library's Digital Asset Player

Contributed by:
Robert Kiley, Head of Digital Services
Wellcome Library
Wellcome Trust
London. United Kingdom
r.kiley [at] wellcome.ac.uk

The Wellcome Library's innovative digital player, the piece of software that is used to 'play' digitised content, is now freely available for anyone to download and re-use.

Developed by Digirati, a design, integration and engineering consultancy, the player can be used to display all types of digital content, including cover-to-cover books, archives, works of art, videos and audio files. And, for works that have been OCR'd, the player also supports "search within" functionality.

The player responds intelligently to the type of item being viewed. If a digital book is opened, the user can navigate by a thumbnail image of each page, or select a chapter or section of the book, or sometimes multiple volumes. If a video is opened, the option to see thumbnails is replaced by the functionality to pause and scroll through the film.

One of the real strengths of the player is the ability to zoom in on images. This applies to all image content, but works especially well on paintings, posters and other pieces of art. The player also allows users to download items, bookmark images for later and embed the player on users' websites. The software been optimised for mobile devices.

In developing this application we were conscious of the need to fully integrate it into the existing Wellcome Library infrastructure. As such, the player fetches images from our existing digital asset management system (Tessella's Safety Deposit Box) and, for content which requires some level of authentication, the player uses the authentication system provided by the Sierra Library Management System. Crucially, all digitised content is exposed through the Library's single discovery platform Encore, which in turn provides a link to enable the user to view the digital content in the player.

The player provides a consistent, easy-to-use and enjoyable user experience and by encouraging others to build upon it we hope that it will become the de facto open-source tool for delivering digital content across the cultural heritage sector and beyond.

The player software, released under the MIT Open Source licence, can be downloaded from the Wellcome LibraryGitHub account. From this site users can also download the source code for the Library's timeline application.

To see the player in action, visit the Wellcome Library site, and read our recent blog post. For further information on the data model used by the player see the step-by-step technical guide developed by Digirati.


I N   T H E   N E W S

IMLS Congressional Justification Now Available

March 7, 2014 — "The Fiscal Year 2015 Appropriations Request, or Congressional Justification, is now available at http://www.imls.gov/assets/1/AssetManager/FY15_CJ.pdf on the IMLS website."

"Additional materials include the news release announcing the president's budget request of $226,448,000, which provides highlights of the agency's funding priorities, and a table showing IMLS appropriations history from 2008-2015."

For more information please see the full press release.


Kuali OLE System Partners receive $882,000 grant from Andrew W. Mellon Foundation

March 5, 2014 — "Indiana University, on behalf of Kuali OLE System Partners, has received an $882,000 grant from the Andrew W. Mellon Foundation for the Kuali Open Library Environment, an open-source, community-based library software system created by a partnership of some of the nation's leading university libraries."

"Kuali OLE is a library management system designed by and for research and academic libraries to oversee and provide access to their growing collections. These include vast amounts of print and licensed digital content, as well as an ever-increasing amount of local, 'born digital' content."

"The OLE project was launched in 2009 with a $2.3 million grant from the Mellon Foundation and matching funds from the Kuali OLE founding partners. With this latest grant, IU has now received more than $4 million from Mellon for the OLE project."

For more information please see the full press release.


President's FY 2015 Budget Request Includes $226,448,000 for the Institute of Museum and Library Services

March 4, 2014 — "Today the President released his FY 2015 Budget Request to the U.S. Congress. The budget includes $226,448,000 for the Institute of Museum and Library Services (IMLS). With these funds, IMLS will provide leadership for the nation's 123,000 libraries and 17,500 museums in all fifty states and U.S. territories through grant making, policy development, and research."

"'I am proud of the contributions that IMLS makes to the American people,' said IMLS Director Susan H. Hildreth. 'I firmly believe that there is a federal responsibility to ensure all Americans have access to the best programming and services that our libraries and museums can provide. During fiscal year 2015, IMLS will help advance a range of museum and library services, with a special focus on STEM learning, early learning, and expanding access to federal information through libraries.'"

"IMLS grant programs support library services in every state and territory through a population-based formula grant. The agency also administers competitive grant programs for libraries and museums that engage hundreds of library, museum, education, and technology professionals in a rigorous peer review process to identify well-designed projects. IMLS supports projects that strengthen library and museum services for Native Americans and Native Hawaiians, as well as projects that strengthen African American museums."

For more information please see the full press release.


Digital Preservation Pioneer: Bill LeFurgy

February 26, 2014 blog posting from Mike Ashenfelder, Library of Congress — "Bill LeFurgy makes gentle stretching motions with his hands as he demonstrates how he exercises his cat's legs every night. Clarence, his cat, just had hip surgery and LeFurgy has to serve as his physical therapist for the next two months. This routine, this new responsibility, comes at a time when he is about to let go of an enormous amount of responsibility elsewhere: he is retiring from the Library of Congress next week, capping 37 years of public service as an archivist, librarian and manager...."

"...in 2002, he joined the Library to work with the National Digital Information and Infrastructure Preservation Program. He expected that it might be another sweeping project like the Department of Energy radiation projects. 'I guess I'm like a moth to the flame when it comes to big national programs to preserve information and make it available,' LeFurgy said. And NDIIPP was a bright flame indeed with its mission to collect, preserve and make available significant digital content, especially information that is created in digital form only, for current and future generations."

"As a digital initiatives manager, LeFurgy helped launch the preserving state government digital information initiative, which ended up working with 35 states to advance records preservation and access. He also worked on a variety of of other projects, including one of the first efforts to help the public preserve their own personal digital material. He traveled the world to share insights gained through the NDIIPP program and to help bring learning back to the Library. LeFurgy also played a key role in launching the NDIIPP social media platforms and this blog. It is to his credit that the steady flow of digital preservation information from the Library of Congress is so widely circulated."

"Now, at the end of his tour of duty at the Library, he looks back with some evident pride at what NDIIPP accomplished. 'In 2002, when I started with NDIIPP, people had just a vague awareness of digital preservation and little in the way of experience of doing it,' said LeFurgy. 'The NDIIPP partners built out an amazing range of tools, standards and practices. All this is a solid foundation that will help everyone continue to move ahead. I won't say that we got everything right. The field was brand new and the Library itself was just starting to form a network of preservation partners. But we made a big difference for the good.'"

For more information please see the full blog posting.


New NDSA Report: The Benefits and Risks of the PDF/A-3 File Format for Archival Institutions

February 20, 2014 blog posting from Butch Lazorchak, Library of Congress — "The PDF/A family of international standards defines a file format based on the Portable Document Format which provides a mechanism for representing electronic documents in a manner that preserves their static visual appearance over time, independent of the tools and systems used for creating, storing or rendering the files. 'Static visual appearance' ultimately means that conforming PDF/A files are complete in themselves and use no external references or non-PDF data."

"The first version of the PDF/A specification (PDF/A-1) was published in September 2005 and has been updated at regular intervals since. The A-3 version of the specification was received with some concern in the stewardship community as it adds a single and highly significant feature to its predecessors. The PDF/A-2 specification permitted the embedding of other files as long as the embedded files were valid PDF/A files. A-3 permits the embedding of files of any format."

"While a PDF/A-3 file's primary document is still intended to be robust against preservation risks over the very long term, PDF/A-3 does not require that the embedded files be considered archival content, creating a series of potential technical and policy challenges for preserving institutions."

"The National Digital Stewardship Alliance Standards and Practices Working Group clearly recognized these challenges and felt the community would benefit from an examination of the format and what it means for collecting institutions."

"Which leads to today's release of the NDSA report on 'The Benefits and Risks of the PDF/A-3 File Format for Archival Institutions' (pdf)."

"The report takes a measured look at the costs and benefits of the widespread use of the PDF/A-3 format, especially as it effects content arriving in collecting institutions. It provides background on the technical development of the specification, identifies specific scenarios under which the format might be used and suggests policy prescriptions for collecting institutions to consider...."

"...Additionally, the report notes that the complexity of the PDF format and the wide variance in PDF rendering implementations and creating applications suggests that PDF/A-3 may be appropriate for use in controlled workflows, but may not be an appropriate choice as a general-purpose bundling format."

For more information please see the full blog posting.


Frontiers launches new Review Forum

February 13, 2014 — "Frontiers, a community-driven open-access publisher and research network, part of Nature Publishing Group family, today announces the launch of a new version of its Interactive Review Forum with upgraded software and new features that enhance and ease the process of collaborative dialogue between authors and reviewers."

"The redesigned Review Forum harnesses the power of the latest Information Technologies to guide authors, reviewers and editors in a virtual environment with tools for fair, efficient, constructive and rigorous peer review."

"Introduced in 2007, Frontiers' interactive peer review enables a collaborative dialogue online in real time between authors and reviewers, with an associate editor as moderator, to deliver high-quality papers. The final decision is based on consensus between reviewers and editors about objective issues. They are named on the final publication to acknowledge their valuable contributions and ensure transparency."

For more information please see the full press release.


Call for Participation: NISO U.S. Profile Standard of ISO 3166 Country Codes

February 13, 2014 — "NISO Voting Members have approved a new work item to develop a U.S. profile of ISO 3166, Codes for the Representation of Names of Countries and their Subdivisions and a working group is being formed for the project. This proposed standard will transition the Geopolitical Entities, Names, and Codes (GENC) Standard, developed by the National Geospatial-Intelligence Agency in 2012, from a government standard to a U.S. National Standard. The GENC standard replaced FIPS Publication 10-4, Standard for Countries, Dependencies, Areas of Special Sovereignty, and Their Principal Administrative Divisions, which was withdrawn by the National Institute of Standards and Technology (NIST) in 2008."

"'The current GENC standard is itself a 'profile' of the ISO 3166-1 standard,' explains Trent Palmer, Geographer with the National Geospatial-Intelligence Agency who submitted the proposal to NISO. 'It incorporates some needs specific to the United States, such as national sovereignty recognition policy restrictions; the requirement to use names of geopolitical entities that have been approved by the U.S. Board on Geographic Names (U.S. Public Law 80-242), but which may not be recognized by the body that manages ISO 3166; and the need to identify and recognize geopolitical entities not identified in ISO 3166.'"

"'Because the GENC is a government standard, its current consensus body does not include any non-governmental voting members,' states Nettie Lagace, NISO's Associate Director for Programs. 'By moving this standard to NISO and making it an American National Standard, its approval consensus body and ongoing development and maintenance can include a wider base of stakeholders-industry, libraries beyond the Library of Congress, academia, and system vendors-many of whom are impacted by the standard.'..."

For more information please see the full press release.


The First World War Centenary: the site that brings all sides together launches in Berlin

January 29, 2014 — "The Deputy Federal Government Commissioner for Culture and Media, Günter Winands, today launched Europeana 1914-1918, an online resource that opens up hidden stories of the First World War and shows the tragedy that shaped Europe from different sides of the conflict."

"Europeana 1914-1918 is the most important pan-European collection of original First World War source material. It is the result of three years of work by 20 European countries..."

"...Europeana 1914-1918 is full of original source material — digitised photographs, maps, diaries, newspapers, letters, drawings and other content that can be used by teachers, historians, journalists, students and interest groups to create new resources. Already, the site is providing content for a new exhibition called 'The First World War: Places of transition (http://e2.ma/click/3alrd/v8prwb/vw0job)' and a new multilingual educational site (http://e2.ma/click/3alrd/v8prwb/bp1job) developed by the British Library in London."

For more information please see the full press release.


CENDI Honors Two with 2013 Meritorious Service Awards

January 20, 2014 — "Jerry Sheehan, Assistant Director for Policy Development, National Library of Medicine, and William (Bill) Adams, Associate Deputy General Counsel (Acquisition) in the Office of the General Counsel, Headquarters, Department of the Army, have been honored by CENDI, the federal Scientific and Technical Information Managers Group. The 2013 CENDI Meritorious Service Award was presented at the CENDI meeting on January 9, 2014, at the National Technical Information Service in Alexandria, Virginia. CENDI's Meritorious Service Award recognizes an individual(s) or team that makes 'a noteworthy contribution to CENDI and to federal interagency cooperation through its events, publications, administration, or outreach.'"

"CENDI recognized Deputy Chair, Jerry Sheehan, for his significant contributions to CENDI information policy programs and discussions over many years. His service includes being a CENDI Alternate and then Principal, Deputy Chair of CENDI, and Chair of the CENDI Policy Working Group. He was a key contributor to development of white papers presented to the Obama Administration on scientific and technical information (STI) issues, and to the CENDI Grand Challenge document, iScience to Jobs, in 2012. Sheehan's support of CENDI has significantly increased the CENDI focus on data and promoted closer CENDI interactions with the President's Office of Science and Technology Policy on open/public access issues. At a time when CENDI agencies are grappling with big data and open science, it is highly appropriate to recognize Sheehan for contributions that have helped to expand CENDI's attention to a range of scientific information policy issues."

"Bill Adams, a significant contributor to the work of CENDI's Copyright and Intellectual Property Working Group, is being honored as an expert on federal acquisitions, patent law, trademark law, copyright law, technical data rights, and technology transfer. He contributed substantively to the Copyright Working Group's major publications, including Frequently Asked Questions about Copyright, Permissions - Government-produced and Non-government Produced Works; Copyright Issues in Mass Digitization (in process), and Frequently Asked Questions about Copyright and Computer Software: Issues affecting the U.S. Government with Special Emphasis on Open Source Software. Adams has demonstrated his commitment to CENDI and advancing interagency cooperation by sharing his expertise in the legal challenges involved in information management and delivery within the federal government."

"CENDI is an interagency consortium of senior STI managers from 14 U.S. federal agencies that represent over 97% of the federal research and development budget. The CENDI Secretariat is headquartered in Oak Ridge, Tennessee, and managed by Information International Associates, Inc., under the direction of Bonnie C. Carroll. More information about CENDI can be found at http://www.cendi.gov."

For more information contact Kathryn Simon ksimon@iaweb.com.


Results Released: Assessment of the Laura Bush 21st Century Librarian Program

January 17, 3014 — "The Institute of Museum and Library Services (IMLS) today released the results of an independent study of its Laura Bush 21st (LB21) Century Librarian Program. IMLS launched the LB21 grant program in 2003 to support projects that recruit and educate a new generation of librarians, faculty, and library leaders, and to support research about the library services field. Since the program’s inception, IMLS has awarded 369 LB21 grants totaling $198,999,539 for library and information science (LIS) education, professional development of library staff, research, and institutional capacity building."

"The research project was conducted by ICF International using a qualitative comparative case study approach. The project compared over 109 grants awarded from 2003 to 2009 across six LB21 funding categories: master's level programs, doctoral programs, early faculty career development, continuing education ventures, institutional support endeavors, and research on the LIS field."

"The goal of the evaluation was to monitor effects on program participants, grantee institutions and organizations, and to identify project characteristics correlated with sustainability."

For more information please see the full press release.


$5.1 Million Awarded for Delving into Big Humanities and Social Science Data

January 15, 2014 — "The Institute of Museum and Library Services and nine international research funders from four countries today announced the winners of the third Digging into Data Challenge http://www.idevmail.net/link.aspx?l=6&d=73&mid=333844&m=1869. The international competition is designed to develop new insights, tools, and skills in innovative humanities and social science research using large-scale data analysis."

"Fourteen international teams representing Canada, the Netherlands, the United Kingdom, and the United States will receive grants totaling approximately $5.1 million to investigate how computational techniques can be applied to “big data.” Their work will address the evolving nature of humanities and social sciences research, which often relies on massive multisource datasets. Each team represents collaborations among scholars, scientists, and information professionals from leading universities and libraries in Europe and North America...."

"...Descriptions of all fourteen funded projects and additional information about the competition can be found at http://www.idevmail.net/link.aspx?l=7&d=73&mid=333844&m=1869."

For more information please see the full press release.

transparent image