Clips & Pointers


D-Lib Magazine
June 2005

Volume 11 Number 6

ISSN 1082-9873

In Brief


Examining Present Practices to Inform Future Metadata Use: The MARC Content Designation and Utilization Project

Contributed by:
Dr. William E. Moen, Interim Director
Dr. Shawne D. Miksa, Fellow
Corrie Marsh
Texas Center for Digital Knowledge
University of North Texas
Denton, Texas, USA

MARC Content Designation Utilization: Inquiry and Analysis is a two-year project that is investigating the extent of catalogers' use of MARC 21, the mark-up language used worldwide to create electronic catalog records. As Machine-Readable Cataloging (MARC) records are used in most electronic library catalogs today, the results of this study will impact libraries of all kinds around the world. Empirical evidence about the use of MARC content designation in current library information retrieval systems will contribute to the dialogue about the future development of MARC and its role in the rapidly evolving networked information environment.

The project's principal investigators, Drs. William E. Moen and Shawne D. Miksa, Fellows of the Texas Center for Digital Knowledge (TxCDK) at the University of North Texas, expect their findings to provide greater insight into practices affecting bibliographic control and information access. The study will provide empirical evidence to document MARC 21 content designation use as well as explore the evolution of MARC content designation for patterns of availability and adoption. Ultimately, the project will deliver a set of methods, procedures, and open source software tools to conduct reliable and valid analyses of MARC 21 content designation that can be used by individual libraries and adapted by other metadata communities.

An important goal of the project is to create tools for the future study of catalog records. Miksa describes how this project will provide research strategies to examine MARC records as artifacts of the cataloging process. She emphasizes that resulting data will greatly inform cataloging education and curricula, which is critical to the continued development and improvement of information retrieval systems in libraries worldwide.

Current MARC 21 specifications define nearly 2000 fields and subfields available to library catalogers working to create catalog records. While working on a 2003 IMLS-funded project to establish a Z39.50 interoperability testbed, Moen found that very few of these fields actually are being used. In fact, Moen discovered that only 36 of the available MARC fields accounted for 80% of all utilization. These preliminary findings have important implications for library catalogers and other library and information science professionals and form the basis for the current study.

The Institute of Museum and Library Services (IMLS), an independent United States Federal grant-making agency dedicated to creating and sustaining a nation of learners by helping libraries and museums serve their communities, is funding the project with a National Leadership Grant of $233,115 to the University of North Texas-TxCDK. The Online Computer Library Center (OCLC), a nonprofit computer library service and research organization, is providing the researchers with approximately 56 million library catalog records from their WorldCat database for use in the study. Further support is provided by the School of Library and Information Sciences at the University of North Texas.

For further details about the project, including documents related to its research, please visit the project Web site at <>.

Report on the NSF Workshop on Scientific Markup Language

Contributed by:
Laura M. Bartolo, Kent State University, <>
Tim W. Cole, University of Illinois at Urbana-Champaign, <>
Sarah Giersch, Association of Research Libraries, <>
Mike Wright, UCAR - DLESE Program Center, <>

The Report of the National Science Foundation/National Science, Technology, Engineering and Mathematics Digital Library Workshop on Scientific Markup Languages is available at <>. The Workshop was held at NSF on June 14-15, 2004, bringing together forty-three representatives in higher education, publishing, software, and government from the disciplines of biology, chemistry, earth sciences, mathematics, materials sciences, and physics. The main goals of the workshop were to assess development of scientific markup languages and to articulate a vision for implementing markup languages in support of research and education. Presentations were given on the current state of markup languages in four specific scientific domains: 1) Chemistry and CML; 2) Earth Sciences and ArcXML, ESML, GML, NcML; 3) Materials Sciences and MatML; 4) Mathematics and MathML. The speakers stressed the view that adoption and development of markup languages must occur simultaneously among authors, end-users, publishers, and vendors. Workshop breakout discussion on Education, Markup Languages, Publishers/Professional Societies, and Database/Tool Developers identified several cross-cutting themes.

Theme A: Vision. The motivation for developing XML-based markup languages is to provide a means to exchange information or data in a structured form so that colleagues across scientific domains can read, understand, and use scientific research to further the development of new knowledge.

Theme B: Demonstrating the value of markup languages. Despite the potential benefit to science and research applications, markup languages' value in those contexts remains unproven. Their broadest implementation to date occurs in processes that are virtually invisible to most users.

Theme C: Creating & disseminating the pre-requisite tools. Better tools, both technically and in the form of broader, more robust ontologies, would facilitate and speed the adoption of scientific markup languages.

Theme D: Mediation of markup languages. "Mediation" encompasses tools and services that provide a translation between representations in different markup languages or that provide access to information on a single markup language to a wide variety of users.

Theme E: Identifying challenges to maturation of markup languages. There are cultural and market-related challenges to sustaining a consensus-building process around scientific markup languages. Scientific markup languages have reached a critical juncture in their development where broader input, more thorough testing, including software implementations, and development of consensus involving publishers, educators, and end-users is required to insure proper maturation.

All workshop documents can be found at <> and sign-up for the Scientific Markup Languages list is available at <>.

JSTOR: Adapting Lucene for New Search Engine and Interface

Contributed by:
Sherry Aschenbrenner
Director of User Services
Ann Arbor, Michigan, USA

In the spring of 2004, JSTOR <> launched a project to update its search engine. Since its inception in 1995, JSTOR had been using a search engine known as "Full Text Lexicographer," or FTL. This proprietary software, developed at the University of Michigan, served JSTOR users for nearly a decade. However, as JSTOR grew, both in amount of content and in the number of participants, inherent limitations in the software made enhancements and scalability difficult.

The effort began with a survey of users' search preferences. Improving the speed of a search and introducing the ability to search all disciplines at once were identified as top goals. Undergraduate and international users also said the existing search interface was difficult to understand and use effectively, so introducing a simple search screen, in addition to maintaining advanced and expert search interfaces, became another major objective. After six months of intensive development work, JSTOR replaced its search engine with one named Lucene. The new search engine was previewed to librarians and publishers in early December 2004, and went live to JSTOR users in January, 2005.

Lucene is open source software written entirely in Java. Because it is open source, the exact source code is available to JSTOR (and to others) at no cost. JSTOR is thus able to modify the underlying source code easily if needed for future development and, if appropriate, may contribute any code changes to the community for use by others. Lucene already includes many of the options available in FTL, such as Boolean operators, fields, proximity searching, and phrase searching; and offers the possibility of a number of new capabilities, such as fuzzy searching, wildcard options, and more flexible nesting of search terms. JSTOR has implemented only a limited number of these features to date but expects to release new search options in the future.

In JSTOR's experience, Lucene has greatly enhanced system performance. The search engine previously used by JSTOR sometimes required 15 seconds or more of processing time for one search, mainly due to the way the indexes underlying each JSTOR search were organized. Each journal was given its own index; these indexes were then searched consecutively to produce results. Lucene uses a more scalable architecture in which all content is indexed in one large file, and now most searches finish in under a second. This results in a significant lightening of the load on JSTOR servers and faster response times for everyone using JSTOR. In addition, JSTOR is now able to completely re-index the entire JSTOR corpus in about 48 hours.

The initial release of Lucene introduced a much faster and more feature-rich set of search capabilities for JSTOR. This is only the first of a series of search engine enhancements, however, as JSTOR is committed to continual improvement of its searching functionality.

In the News

Excerpts from Recent Press Releases and Announcements

Digital Knowledge for all, but what about for ever?

June 16, 2005 - "A report published today by the Museums, Libraries and Archives Council (MLA) and Digital Preservation Coalition (DPC) throws out a challenge about the future access to digital museum, library and archive collections."

"Digitisation is making more museum, library and archive collections accessible across the internet. MLA and DPC are working with a range of national partners to ensure that the knowledge held in those institutions can be accessed wherever and whenever it is needed. Digitisation means that objects and information in different places can be brought together to create virtual collections, matched to the particular needs of the searcher. But a new survey shows that these digitised collections may be at significant risk of being lost to future generations if the issue of digital preservation is not addressed."

"The survey, which will inform the development of a national digitisation strategy, looked at non-national museums, libraries and archives in two English regions - the North East and West Midlands - to discover how well prepared they are to deal with the problems of keeping digital material in the long term. The results show that there is a significant commitment to digitisation, with over 80 digitisation projects currently in place. However, the survey highlighted a major concern that 90 per cent of the projects were externally funded and therefore took no account of the need to provide the long-term, sustainable support needed to preserve and protect public access to the digital collections."

For more information, please see the full press release at <>.

The European Library seeks user opinion

June 16, 2005 - "The European Library ( wants to find out what works for the user of the new portal and what doesn't. Together with IRN Research ( and in co-operation with the national libraries of Britain, Finland, France, Germany, Italy, Netherlands, Portugal, Slovenia and Switzerland an online survey has been created and is available in:

"The survey is mainly about the search and retrieval capability of the site and the developers would be very grateful to any researcher able to give 10-15 minutes worth of time to answering the 8 questions. The European Library will donate 2 Euros to Book Aid International ( for every completed questionnaire it receives. The answers given will help The European Library to decide the best way to develop the portal for the benefit of its users. "

"The European Library is a portal for accessing the digital collections of 9 of the National Libraries of Europe. The European Library is owned by the Conference of European National Librarians (CENL) and aims to access digital collections from all 43 member libraries within the next 5 years. It was launched as a Beta site in March of 2005."

"For further information please contact:
Sally Chambers - The European Library Office
Email: <> "

Office of the Future: 2020 - Research Identifies Future Workplace Trends and Skills Necessary for Success

June 15, 2005 - "The future office will be increasingly mobile, with technology enabling employees to perform their jobs from virtually anywhere, according to Office of the Future: 2020, a research study recently released by OfficeTeam. But greater control over where and how people work won't necessarily translate into more free time. Forty-two percent of executives polled said they believe employees will be working more hours in the next 10 to 15 years. "

"OfficeTeam, a leading staffing service specializing in highly skilled administrative professionals, created Office of the Future: 2020 as a follow-up to its previous research project, Office of the Future: 2005, released in 1999. Trends identified then are a reality today, including the use of multifunctional, wireless technology to conduct business from various locales. Administrative professionals also are now playing a greater role in activities such as Internet research, desktop publishing, computer training and support, and website maintenance."

"With Office of the Future: 2020, OfficeTeam examines trends that may impact the workplace in the next 10 to 15 years. In addition to interviews with workplace and technology experts, futurists, and trend watchers, OfficeTeam surveyed workers and executives at the nation's 1,000 largest companies."

For more information about the findings, please see the full press release at <;jsessionid=

RFID Institute at UNT

June 14, 2005, announcement from the University of North Texas - "The Texas Center for Digital Knowledge will be co-sponsoring an RFID Institute with the National Information Standards Organization. The Institute will be held at the University of North Texas (Denton, Texas) on October 25-26. 2005."

"The RFID Institute will examine current issues in ROI, global RFID, interoperability standards, as well as implementations in industry, manufacturing, distribution, and information centers/libraries. The large number of suppliers and RFID solutions companies in the DFW area are contributing to this program. Participants will also include NISO members from information centers, information technology, publishers, libraries, and booksellers. "

"For more information please contact: Ms. Corrie Marsh, TxCDK Associate Director at <> or (940) 565-4552."

National Digital Newspaper Program (NDNP) Announces New Program Web site

Announced June 2, 2005, by Laura Gottesman, Library of Congress - "The Library of Congress is pleased to announce a new Web site, <>, providing an overview and technical specifications for the development phase of the National Digital Newspaper Program (NDNP). This program, a partnership between the National Endowment of the Humanities (NEH) and the Library of Congress (LC), is a long-term effort to develop an Internet-based, searchable database of all U.S. newspapers with descriptive information and digitization of select historic pages. Supported by NEH, this rich digital resource will be developed and permanently maintained at the Library of Congress. An NEH grant program will fund the contribution of content from, eventually, all U.S. states and territories. "

"An initial development phase will run through 2007, and will include content from 6 NEH state awardees (University of California, Riverside; University of Florida Libraries, Gainesville; University of Kentucky Libraries, Lexington; New York Public Library, New York City; University of Utah, Salt Lake City; and Library of Virginia, Richmond) providing 100,000 pages each of historic material published between 1900-1910. In addition, the Library of Congress will contribute 100,000 pages from its own historic collections, representing the District of Columbia."

"Program information for the National Digital Newspaper Program is available from the Library of Congress's Preservation Web site: <>. Please direct any questions regarding this Web site to the LC NDNP program contacts at: <>."

New Online Publishers Association Study Identifies Key Experiences That Drive Web Usage

June 1, 2005 - "The Online Publishers Association (OPA) today unveiled the results of its latest research project, the "Online User Experience Study." Conducted in partnership with the Media Management Center at Northwestern University, the study identified 22 experiences that describe and define of how people interact with and relate to digital media, and determined how each of those specific experiences impact site usage."

"'Experience is a critical concept to understand, particularly in a crowded environment where media constantly compete for consumers' attention,' said Michael Zimbalist, president of the Online Publishers Association. 'It goes beyond providing content that gets good user satisfaction ratings, to involving and engaging users' minds and emotions. Properly implemented, it can elevate a product from something that satisfies a basic need to something that compels repeat usage and loyalty.'"

"The research involved a combination of personal interviews and surveys. In-person interviews were conducted with 65 Internet users from across the country. The statements these Web users used to describe those experiences were then incorporated into an online questionnaire. Analysis of the survey results identified 22 distinct user experiences. The survey also measured site usage for each respondent and its relation to how that user rated each of the experiences identified through the qualitative interviews. From there, the relationship between usage and experience was derived, and the experiences were ranked according to their relative impact on site usage."

For more information, please see the full press release at <>.

New IMLS Publication: Charting the Landscape/Mapping New Paths: Museums, Libraries, and K-12 Education

June 1, 2005 - "The Institute of Museum and Library Services (IMLS) has released a report on how museums and libraries bolster K-12 education and lifelong learning in communities across the Nation. "Charting the Landscape, Mapping New Paths: Museums, Libraries, and K-12 Learning," is based on a workshop the Institute hosted August 30-31, 2004 at which more than seventy educators, researchers, policymakers, and museum and library professionals examined K-12 collaborations among their organizations. "

"As the report notes, workshop participants agreed that in the 21st century, a competitive and successful society will require people who never stop learning. It is essential, therefore, to build a foundation for lifelong learning during the elementary and secondary school years. The responsibility for building that foundation and nurturing lifelong learning does not rest with schools alone but cuts across institutional boundaries to include museums, libraries, and other community organizations."

"...The report highlights projects and partnerships and can be used as a tool to lay a foundation for understanding the power of museum, library, school, and community collaborations in cultivating lifelong learning societies. It includes an appendix of selected resources, most of which are available online and a useful glossary of terms used throughout the workshop. To obtain free copies of the report, email the Institute of Museum and Library Services at <>, or access it electronically from the agency's Web site at <>."

For more information, please see the full press release at <>.

Measuring the impact of the People's Network over time: MLA launches LONGITUDE toolkit

May 26, 2005 - "The Museums, Libraries and Archives Council (MLA) has launched LONGITUDE: a toolkit of resources for public library staff to evaluate the long-term impact of IT-based services on users."

"...New ICT services are having a marked impact on library use. It is important to be able to measure these benefits over a period of time, especially as users become more and more familiar with ICT and all it has to offer. The LONGITUDE toolkit will enable library staff to do just that."

"The toolkit has been developed by the Centre for Research in Library & Information Management (CERLIM) at the Manchester Metropolitan University, in partnership with Birmingham Public Libraries and Cheshire County Libraries, where the toolkit was piloted."

For more information, please see the full press release at <>.

The Scholarly Publishing Office of the University of Michigan University Library is pleased to announce the return of The Journal of Electronic Publishing

May 26, 2005 - "Since its first issue in 1995, JEP (The Journal Of Electronic Publishing) has been a source of innovative ideas, of best practices, and leading-edge thinking about all aspects of publishing, authorship, and readership in the electronic environment. Returning after a three-year hiatus, JEP will continue to document the changes in publishing with the growth of the Internet, and to stimulate and shape the direction of those changes."

"The University of Michigan University Library's Scholarly Publishing Office will re-launch JEP in January, 2006, ensuring its future and keeping it at its original home at the university. The journal fits the SPO's commitment to library-based scholarly publishing and will offer a fresh look at the state of electronic and print publishing worldwide. JEP will continue to look back over the past 20 or 30 years to see how we've come to this point in the history of publishing, and look forward to where publishing may be heading. It will look inward at key players and practices of publishing, and also look outward at fringe movements that are challenging traditional publishing interests, and at readers worldwide affected by the interplay of technological and economic forces that have revolutionized social communication."

"JEP's editor, Judith Axler Turner, will remain at the helm, with editorial input and publishing support from Maria Bonn and Mark Sandler of the Scholarly Publishing Office. A new editorial board will be constituted, and JEP will solicit articles that present wide-ranging and diverse viewpoints on contemporary publishing practices, and encourage dialogue and understanding between key decision-makers in publishing and those who are affected by the decisions being made."

If you would like to discuss JEP's future or discuss a possible article, please contact the editorial team at <>.

Library use soars

May 25, 2005 - "A new report published by the Chartered Institute of Public Finance and Accountancy shows a record rise in public library usage across the UK."

"In 2003/04, visits to public libraries increased by nearly 14 million, over 250,000 extra visits a week. This is the second consecutive annual rise and builds on an additional 5 million visits made in 2002/03 – the first upturn in usage since the early 1990s."

"The sea change in popularity coincides with the introduction of computers and internet access into all 4200 of the UK's public libraries. Thanks to the lottery-funded People's Network project there are 32,000 computer terminals offering broadband internet access in public libraries, and all library staff have been trained to provide help and advice for users."

For more information, please see the full press release at <>.

DLIST Advisory Board and DL-Harvest Established

Announced May 25, 2005 by Anita Coleman, University of Arizona - "We are pleased to announce the establishment of an Advisory Board for DLIST, Digital Library for Information Science and Technology, and delighted to invite the community to explore DL-Harvest."

"The members of the first DLIST Advisory Board are:

  • Subiah Arunachalam, Distinguished Fellow, M S Swaminathan Research Foundation, Chennai, India
  • Julia Blixrud, Assistant Executive Director, Association for Research Libraries, Washington, DC
  • Rachel Bower, Co-Director, The Internet Scout Project, Madison, Wisconsin
  • Michael Gorman, Dean of Library Services, California State University, Fresno, California
  • Birger Hjorland, Research Professor, Royal School of Library and Information Science Copenhagen, Denmark
  • Christopher Khoo, Programme Director and Associate Professor, School of Communication & Information, Nanyang Technological University, Singapore
  • Scott Nicholson, Assistant Professor, School of Information Studies, Syracuse University, Syracuse, New York"

"Biographical sketches of the DLIST Advisory Board members are available at < The Advisory Board is helping DLIST build a dynamic public domain for the library and information professions and disciplines.>"

"DL-Harvest is a federated archive, an open aggregator service of DLIST. It brings together full-text, scholarly materials in the Information Sciences from many different OAI-PMH compliant repositories. DL-Harvest is using PKP Harvester with software improvements for flow control, sets, and advanced searching, that have been developed by DLIST Graduate Research Assistant, Joseph Roback. Besides DLIST, the current list of 11 archives harvested includes selective harvesting from ArXiV. DL-Harvest is available at <>."

"DLIST, <>, is a service of the School of Information Resources and Library Science and Learning Technologies Center, University of Arizona. Seed monies for DLIST are from the University of Arizona College of Social and Behavioral Sciences and the Proposition 301 funded initiative of the Internet Technology Commerce and Design Institute, now the Arizona Center for Information Science and Technology. DLIST is running on EPrints2 archive-creating software, which generates eprints archives that are compliant with the Open Archives Protocol for Metadata Harvesting OAI v2.0."

For more information, please contact Anita Coleman, <>.

JISC signs Berlin Declaration on open access

May 20, 2005 - "At an international conference in the Netherlands last week, the Chair of JISC's Integrated Information Environment Committee, Reg Carr, signed the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities on behalf of JISC. The declaration commits signatories to the 'new possibilities of knowledge dissemination not only through the classical form but also and increasingly through the open access paradigm via the Internet'."

"Through its work to support the development of institutional repositories and such initiatives as its open access programme, JISC has been exploring alternative models of publishing and dissemination of research outputs, including open access publishing. At the conference, a number of Dutch universities and the Netherlands Organisation for Scientific Research (NWO) also signed the Declaration, whose signatories include the Pasteur Institute, the Max Planck Society, the National Science Foundation of China, SURF, JISC's counterpart in the Netherlands, and dozens of universities and national bodies."

"The conference, held in Amsterdam, also saw the launch of a new open access website 'Cream of Science'. The website contains over 41,000 publications by some 200 leading Dutch scientists, with nearly two-thirds of the publications available full text. Within a day of its launch the site received half a million hits. A delighted Martin Feijen of the SURF Foundation said: 'We had predicted about 50,000 searches between May and November 2005, but reached the target in just a single day.'"

For more information, please see <>.

Copyright 2005 © Corporation for National Research Initiatives

Top | Contents
Search | Author Index | Title Index | Back Issues
Previous Article | Clips & Pointers
E-mail the Editor