Clips & Pointers


D-Lib Magazine
July/August 2004

Volume 10 Number 7/8

ISSN 1082-9873

In Brief


ALVIS - Superpeer Semantic Search Engine

Contributed by:
Anders Ardö
Associate Professor
KnowLib, Knowledge Discovery and Digital Library Research Group
Department of Information Technology, Lund University
Lund, Sweden

ALVIS ( conducts research in the design, use and interoperability of topic-specific search engines with the goal of developing an Open Source prototype of a distributed, semantic-based search engine.

The vast quantity of information sets new challenges for even the best commercial search engines. Building next generation search engines is not just a question of scaling existing techniques. What is needed is a departure from the existing keyword search that has made current search technology cumbersome even for the skilled. Qualitatively better ways are needed to allow more meaningful, semantically processed queries. New delivery modes are needed to make searching peer-to-peer as another common resource in the spirit of the Web itself.

Our approach is a semantic-based search engine building on content through automated analysis. Linguistic processing is right in the heart of the search engine, which uses a probabilistic document model for relevance evaluation and information retrieval. Natural language processing and statistical processing of the corpus extracts semantic information from both queries and documents, thus providing disambiguation, semantic clarification and topical content. The automatically extracted information can enrich indexed documents and can be used to estimate the relevance of documents to queries.

To demonstrate the scalability of a distributed peer-to-peer search system, new basic algorithms and protocols for distributed search needs to be developed. This includes efficient query distribution and result merging using implicit and automatically generated semantics.

Focused Web-crawling will be an integral component of ALVIS, capable of generating topic-specific databases of Web-pages by crawling the Web and only saving relevant (i.e., topic-specific) pages. The crawler uses techniques for automated subject classification developed within the Knowledge Discovery and Digital Library Research Group (KnowLib) ( Topic-specific customization is supported by the semantic search system, for instance by using extracted terminology.

The combination of design, distributed operation and Open Source development have been chosen to support incremental growth, third-party involvement, and low barrier to entry, as well as provide a small degradation of quality in results over an equivalent monolithic system.

ALVIS is an EU Sixth Framework Programme (FP6), Information Society Technologies ( project, which started 1 January 2004, and will last for 3 years. Partners come from Finland (Helsinki Institute for Information Technology), France (Institut National de la Recherche Agronomique, Unite Mathematique, Informatique et Genome; Exalead SA; UniversitÉ Paris 13, Laboratoire d'Informatique de Paris-Nord), Switzerland (EPFL, Ecole Polytechnique Federale de Lausanne, Distributed Information Systems Lab), Sweden (Lund University, Dept. of Information Technology, KnowLib), Denmark (Technical University of Denmark, Center of Knowledge Technology; Index Data Aps), Spain (ALMA Bioinformatica, S.L.), Slovenia (Jozef Stefan Institute, Dept. of Intelligent Systems and Dept. of Knowledge Technologies), and China (Tsinghua University, Dept. of Computer Science and Technology).

Invitation to Participate in the Validation Test for the Xtensible Past Project

Contributed by:
Rutger Kramer, <>
and Annelies van Nispen, <>
The Netherlands Institute for Scientific Information Services
Amsterdam, The Netherlands

The aim of the Xtensible Past pilot project is twofold. On one hand, the project explores the possibilities of XML (eXtensible Markup Language) and OAI (Open Archives Initiative) for providing better access to and sharing of digital data collections by researchers, and on the other hand, the project investigates XML as a new strategy for the long-term preservation of research data.

The Netherlands Historical Data Archive (NHDA) preserves and gives access to a heterogeneous collection of historical data created by historians for various research projects. This project aims to improve the services of the NHDA to the research community. X-Past is now engaged in testing the developed prototype.

System Architecture (in a nutshell)

The information interchange taking place in the X-Past system is developed according to the following diagram.

Figure 1

In this diagram, two distinctive paths can be seen. The upper path guides the repository metadata to the user interface, the lower guides the actual dataset tables. Both paths contain a conversion to XML, the format in which the data is eventually stored and queried.

Validation Test

We would like to invite all readers to take a look at our project and provide us with feedback. The goal of the Validation Testing Page is to assess to what extent the current prototype conforms to the requirements of the users.

The X-Past system has initially been built with the aid of a small number of expert users who have given input to the development team. This input has been transformed into a working prototype which makes it possible to:

  • Search the metadata of the datasets, as well as the datasets themselves
  • Download search results as XML for re-use
  • Download the original dataset files

The feedback from the Validation Testing Phase will be used to tweak the XPast system; additional requirements and wishes will be implemented after the Validation Testing Phase has ended. Our aim is to complete the final prototype in August.


The Testing Phase is aimed at the searching, navigating and downloading functionality of the system, as well as the lay-out of the webpages. Everyone can join the Validation Testing Phase as a tester. In order for people to get acquainted with the system, we've drawn up a user manual that introduces the various functions of the system. We've also written a walkthrough which will guide new users through all of the features of the system and contains questions to which will provide us with feedback.

Validation Test

X-past project page

For questions and/or feedback mail to:

The Electronic Records Management Training Package

Contributed by:
Steve Bailey
Electronic Records Manager
Joint Information Systems Committee (JISC)

JISC <> has recently launched an online training package designed to help staff working in the UK Further and Higher Education (FE and HE) sectors to better manage their electronic records. The Electronic Records Management Training Package <>, created on JISC's behalf by the University of Northumbria was funded as part of JISC's Supporting Institutional Records Management <> funding programme which was designed to help promote and develop records management as a means of ensuring legal and regulatory compliance and preserving institutional assets.

Electronic records management (ERM) requires staff from all levels/areas of an institution to appreciate their role/responsibilities as creators and users of electronic records. This training package provides a step-by-step assessment tutorial approach as well as the ability to search, pick and choose from a wide range of individual topics including "organising records", "managing email", "retention and disposal", and "preservation".

The training package is suitable for use by all levels of staff and in all areas of FE/HE institutions either on an individual basis or as part of a group training programme facilitated by those responsible for records management within individual institutions. This is particularly important, as JISC is aware that there is currently a wide divergence of practice across the sector, with some institutions having their own Records Manager and well-established programmes, whilst others lack access to professional staff and are only now taking their first steps towards developing a records management programme.

This training package is just one of several online sources of advice relating to records management due to be released by JISC in the near future. The Records Management InfoKit will shortly be available aimed at staff in Further and Higher Education who are new to the discipline of records management and who need sensible, practical advice about the issues they are facing as wll as techniques for helping to solve them. Also due for release in July is a toolkit, the Electronic Document & Records Management (EDRM) System Implementation Toolkit <>, which will provide records managers and other information professionals with a "one stop shop" for impartial, detailed and practical advice of use during all stages of a proposed or actual EDRM system.

In the News

Excerpts from Recent Press Releases and Announcements

ARL Endorses Digitization as an Acceptable Preservation Reformatting Option

July 20, 2004 - "ARL has endorsed digitization as an accepted preservation reformatting option for a range of materials. It encourages its members and others already engaged in digital reformatting and those interested in initiating these activities to make organizational and economic commitments to adhere to accepted standards and best practices in digital reformatting and to establish institutional policies to maintain digital products for the long term. At the same time, ARL recognizes that the choice to use digitization, or any reformatting option, for preservation is not prescriptive—it remains a local decision. Many approaches are possible and digital reformatting should now be considered a valid choice among the various methods for preserving paper-based materials."

"This endorsement comes from the work of the ARL Preservation Committee, which concluded that the emerging consensus around best practices for the creation and long-term maintenance of digital files, coupled with the overwhelming advantages of digitization for access, argue for support by the library community of digitization as a viable preservation reformatting strategy. William A. Gosling, University Librarian at the University of Michigan and Chair of the ARL Preservation Committee, said, 'Students, and more and more faculty, expect libraries to provide information in electronic formats. It is now time to move forward, time to recognize and adopt digitization as an acceptable preservation option for reformatting brittle and hard-to-access materials. ARL is prepared to serve as a catalyst for this movement.' ARL looks forward to the support and leadership of the preservation community and allied organizations in this endeavor."

"As a first step in building community support and facilitating the development and implementation of policies, standards, guidelines, and best practices where they do not currently exist, ARL has released 'Recognizing Digitization as a Preservation Reformatting Method.' <>. The paper, prepared by staff at the University of Chicago and the University of Michigan, benefited from the comments of a number of additional preservation staff and funding agencies staff as well as from many ARL directors."

For more information, contact:

Carla Montori, Head
Preservation Division
University of Michigan

Judith Matz
Communications Officer, ARL

IMLS Seeks Comments on Impact of Museum and Library Services Analysis

July 15, 2004 - "WASHINGTON, DC - The reauthorization of the Museum and Library Services Act creates new authority for IMLS to carry out and publish analyses of the impact of museum and library services. The Act stipulates that these analyses should be conducted in ongoing consultation with stakeholders, including 'State Library Administrative Agencies; state, regional, and national library and museum organizations; and other relevant agencies.'"

"The Act further states that these analyses shall 'identify national needs for, and trends in, and impact of museum and library services provided with IMLS support, report on the impact and effectiveness of programs conducted with funds made available by the Institute in addressing such needs, and identify and disseminate information on the best practices of such program.'"

"IMLS is developing a plan to address the requirements of the statute. As a first step, IMLS is requesting public comment to identify national needs for, trends in, and impacts of museum and library service. These comments will be used to identify areas in which analyses would be useful. The following questions are intended to assist stakeholders in identifying high-priority areas for IMLS to explore through further research and study. Following collection of public comments, IMLS will contact up to 50 key members of stakeholder groups for structured interviews regarding the list of possible topics for analysis. Both the public comments and results of the structured interviews will provide the foundation for IMLS to use in fulfilling this new requirement."

For more information on how to send your comments on this matter to IMLS, please see the full press release at <>.

IMLS Awards over $19 Million to Nation's Museums and Libraries in Second Grant Round for FY2004

July 15, 2004 - "WASHINGTON, DC - The Institute of Museum and Library Services (IMLS), the federal agency that supports the nation's museums and libraries, awarded $19,114,725 to 311 museums and libraries across the country yesterday. IMLS received 404 applications requesting over $52 million. Many recipients will match the grants for an additional $15,745,011 to America's libraries and museums. For grants made in your state, please see <>."

For more information, please see <>.

IMLS Awards Over $1 Million in Federal Grants for Continuing Education and Training Programs for Librarians

July 13, 2004 - "WASHINGTON, DC - The Institute of Museum and Library Services (IMLS), the federal agency that supports the nation's museums and libraries, has awarded $1,018,830 in grants to provide librarians and their staff continuing education and training in the latest research methods and technologies to help sustain our nation of learners. The awards are being funded under the Institute's prestigious National Leadership Grants program."

"The institutions selected for funding today are the Illinois State Library (Springfield), Cornell University Library (Ithaca), and the Online Computer Library Center (Dublin, Ohio). They will match the awards with an additional $1,087,362. For a list of the funded institutions organized by state with descriptions of their winning grant projects, please see <>."

For more information, please see the full press release at <>.

Over $14.7 Million to Recruit New Librarians for 21st Century

July 13, 2004 - "WASHINGTON, DC - The Institute of Museum and Library Services (IMLS), the federal agency that supports the nation's museums and libraries, has awarded $14,790,543 to library schools and library service organizations to recruit and educate new librarians to help offset a looming national shortage. For a contact list of the organizations funded with descriptions of their winning grant projects, please see the attached list."

For more information, please see the full press release at <>.

DTIC Established as a DoD Field Activity

July 8, 2004 - Ft. Belvoir, VA - More than 300 civilian employees of the Defense Technical Information Center (DTIC) recently greeted Dr. Ronald Sega, Director, Defense Research & Engineering (DDR&E) as he marked the establishment of DTIC as a DoD Field Activity. DTIC will be under the Under Secretary of Defense for Acquisition, Technology and Logistics (AT&L) and will report to Dr. Sega.

In his remarks on July 7, 2004, Dr. Sega said, "Technology is vital to DoD's transformation. Efficient and dynamic information is key to technology development and transition and DTIC is vital to DDR&E and DoD." Dr. Sega's meeting with DTIC staff followed the signing of a Decision Memorandum on June 4, 2004, by Mr. Paul Wolfowitz, Deputy Secretary of Defense, which elevated DTIC to Field Activity status.

Mr. Kurt Molholm, DTIC Administrator, stated that "all DTIC staff look forward to being a part of DDR&E and working toward its primary goal of ensuring that technology performers are closely aligned with scientific and technical (S&T) information providers."

DTIC's products and services are used by its customers to maximize research knowledge in performing the multibillion dollar DoD research efforts authorized and funded annually by Congress. DTIC provides DoD with information on research activities of other DoD agencies and their contractors. This prevents unnecessary or duplicate research at the taxpayers' expense.

DTIC is a major player in the DoD E-Gov initiative to create a centralized view of DoD research and development (R&D) data sources and relevant information. DTIC's early adoption of Web technology has resulted in the organization hosting more than 100 of DoD's Web sites including the popular DefenseLINK,

The gateway to DTIC's products and services is its Web site, <>. DTIC's flagship Scientific and Technical Information Network (STINET) service is one of DoD's largest scientific and technical online information resources. This service is available in two versions, Public STINET for the general public, and Private STINET for DTIC registered users.

Well known as the DoD central facility for defense information for almost 60 years, DTIC provides a "one-stop" access point to DoD S&T information. DTIC resources are available to DoD, the military services, other U.S. Government agencies, contractors to DoD and other government agencies, potential contractors, and universities with federal research grants. Registration is required for access to many DTIC products and services. For more information about how to obtain DTIC products and services contact <> or call 703-767-8244.

APA Announces August Launch and Extended Trials of PsycBOOKS™

July 7, 2004 - "The American Psychological Association (APA) has announced the launch of a new full-text database, PsycBOOKS, which will give researchers instant access to a wealth of scholarly resources. The initial release will include over 600 books and more than 10,000 individual chapters published by APA and over 1,500 entries from the Encyclopedia of Psychology, co-published by APA and Oxford University Press, as well as a selection of archival books from other publishers. Extended free trials will be available from August 15, 2004, through September 30, 2004, on APA PsycNET®. The file will also be released at that time for loading by vendors who have contracted with APA to provide access to PsycBOOKS. "

For more information, please see <> or contact William C. Hayward <>.

NCLIS Agenda: Assessment, Relevance, and Research National Commission Selects Strategic Goals

July 6, 2004 - "Washington, DC USA - The U.S. National Commission on Libraries and Information Science (NCLIS) has announced three strategic goals to guide its work in the immediate future:

  • Appraising and assessing library and information services provided for the American people
  • Strengthening the relevance of libraries and information science in the lives of the American people
  • Promoting research and development for extending and improving library and information services for the American people"

"Beth Fitzsimmons, NCLIS Chair, noted that in the legislation that created the Commission, Congress included a Statement of Policy that affirmed that 'library and information services adequate to meet the needs of the people of the United States are essential to achieve national goals' and that 'the Federal Government will cooperate with State and local governments and public and private agencies in assuring optimum provision of such services.'"

For more information, please see the full press release at <

Archive of medical journals to go online

An archive of medical journals, some dating back more than 125 years, will be made freely available on the Internet.

June 28, 2004 - "The Wellcome Trust, in partnership with the Joint Information Systems Committee (JISC), and the U.S. National Library of Medicine (NLM) are joining forces to digitise the complete backfiles of a number of important and historically significant medical journals. The digitised content will be made freely available on the Internet—via PubMed Central and augment the content already available there."

"With funding of £1.25 million [£750,000 from the Trust, £500,000 from the JISC] the project plans to digitise around 1.7 million pages of text. The NLM will manage the project, host the archive and ensure that the digital files are preserved in perpetuity."

"In addition to creating a digital copy of every page in the backfiles, the digitisation process will also create a PDF file for every discrete item (article, editorial, letter, advertisements etc.) in the archive, and use optical character recognition (OCR) technology to generate searchable text."

"Although the project focuses on digitising backfiles, publishers will also include new issues of the selected journals on an ongoing basis subject to an embargo period, as defined by each participating publisher."

For more information, please see the full press release at <>.

Digital Preservation Coalition and Pilgrim Trust announce winner of first Digital Preservation Award

June 23, 2004 - "The first Digital Preservation Award worth £5,000 in recognition of leadership and achievement in the developing field of digital preservation, was presented tonight by Loyd Grossman to The National Archives for its Digital Archive at a ceremony held at the British Library."

"The National Archives beat off competition from around the world with the first all-purpose digital archive, designed to store Government records in many different formats. As the Modernising Government Agenda aims to have all new records stored and retrieved electronically, it is crucially important that digital records will be preserved as effectively as paper ones. The Digital Archive ( will store important Government records, from public enquiries such as the Hutton Inquiry, to e-mails, web pages and databases."

"...The judging was very close and for this first Award, the judges decided that there should also be a special commendation for one of the other short listed projects, the Specially Commended certificate was presented to The CAMiLEON project for their work on testing technical strategies."

For more information, please see <>.

Wonders of the web captured forever...

Launch of UK Web Archiving Consortium will dramatically boost lifespan of key web materials

June 21, 2004 - "Launched today, the UK Web Archiving Consortium (UKWAC) aims to expand the lifespan of website materials from around 44 days (the same life expectancy as a housefly) to a century or more. Comprising six leading UK institutions, the UKWAC will work, with the permission of rights holders, on an experimental system for archiving selected key UK websites—ensuring that invaluable scholarly, cultural and scientific resources remain available for future generations."

"The UKWAC—comprising The British Library, Joint Information Systems Committee of the Higher and Further Education Councils (JISC), The National Archives, The National Library of Wales, the National Library of Scotland and the Wellcome Trust—will run for an initial period of two years, during which approximately 6,000 websites will be collected and archived."

"Consortium members will obtain the permission of website owners to archive selected sites whilst working collaboratively to explore how to develop compatible selection policies and to investigate the complex technical challenges involved in collecting and archiving web material."

For more information, please see the project web site at <>.

Copyright 2004 © Corporation for National Research Initiatives

Top | Contents
Search | Author Index | Title Index | Back Issues
Previous Conference Report | Clips & Pointers
E-mail the Editor