Report on the EU-NSF Working Group on Multilingual Information Access

Judith L. Klavans
Center for Research on Information Access
Columbia University

Peter Schäuble
ETH Zurich

The first meeting of a new working group on Multilingual Information Access for Digital Libraries took place November 16-18, 1997 at the Columbia University Arden Homestead in New York under the joint sponsorship of the U.S. National Science Foundation (NSF) and the European Union (EU). This working group is part of a larger international collaboration between the NSF and the EU that brings together American and European scientists engaged in digital library research to plan common research agendas, share research results, and explore national, technical, and social expectations about digital libraries. The focus of the Multilingual Information Access working group is on the problems faced by the international community with storage, access and presentation of information in any of the world's languages. Other groups are addressing interoperability, metadata, intellectual property and commerce mechanisms, and resource indexing and discovery in a globally distributed digital library. Each working group includes approximately ten members.

The Multilingual Information Access working group includes researchers from information retrieval and natural language processing, two fields that are converging increasingly in the area of information access. Two primary areas of research were addressed during the first meeting: first, the problem of encoding, manipulating and displaying information in any language; and second, methods for querying, retrieving and presenting that information. The discussion concentrated on understanding user needs, identifying existing tools and resources, and prioritizing research issues for near, medium, and long-term planning. For example, there is a growing need to support access to documents in many languages with retrieval systems that can accept queries in the native or preferred language of the user. Furthermore, an accurate profile of the user's linguistic capabilities would allow retrieved information to be presented without translation when possible, but translated or summarized in another language when needed. At the second working group meeting, scheduled for March, 1998 in Zurich, it is hoped that a representative from the Pacific Rim will also be present to enhance the group with additional perspectives. In that meeting the group is expected to complete the user needs assessment, refine the set of research issues, and prioritize the research agenda.

The long-term aim is to assist the multilingual information access community to identify research directions towards which future efforts should most usefully be concentrated. The working group plans to produce a white paper containing findings and recommendations in the summer of 1998. Since the Multilingual Information Access working group seeks to serve the larger Digital Library community, comments and recommendations from researchers interested in these issues will be invited.

Additional information about the working group and contact information for the participants is available on the group's web site at <http://www.cs.columbia.edu/~klavans/Activities/MIA/home.html> or from the working group co-chairs: Judith Klavans (USA) <klavans@cs.columbia.edu> and Peter Schäuble (EU) <schauble@inf.ethz.ch>. The working group is funded by the National Science Foundation through the University of Michigan (Grant Number 9605202) <http://www.si.umich.edu/UMDL/EU_Grant/home.htm> and by the European Union through the European Research Consortium for Informatics and Mathematics (ERCIM) <http://www-ercim.inria.fr>.

New D-Lib Working Group on Digital Library Metrics Formed

Barry Leiner
CNRI (Consultant)
Reston, Virginia

Much of digital library research is experimental or exploratory. Research projects lead to demonstrations, pilot systems, and eventually to deployment in production systems. Currently, there are few ways to evaluate the effectiveness of research, or to measure progress towards long-term goals despite a long tradition of user and usability studies in fields as diverse as library science and engineering. A notable exception is information retrieval, which has been greatly enhanced by the existence of the well-established measures of precision and recall. These metrics, in conjunction with standard corpora that can be used for testing and evaluation, have helped further the state of the art by allowing researchers to do comparisons and evaluations on a fair comparison basis.

While these measures have been very useful in evaluating and comparing "single site", "batch oriented" search and retrieval mechanisms, the richness of the digital library environment demands a much richer set of metrics. Metrics are required to deal with issues such as the distributed nature of the digital library, the importance of user interfaces to the system, and the need for systems approaches to deal with heterogeneity amongst the various components of the digital library.

To address this issue, a new Working Group on Digital Library Metrics has been formed under the auspices of the D-Lib Program. This Working Group is to develop a consensus on an appropriate set of metrics to evaluate and compare the effectiveness of digital libraries and component technologies in a distributed environment. Initial emphasis will be on (a) information discovery with a human in the loop, and (b) retrieval in a heterogeneous world.

More information on the Working Group may be found on the D-Lib home page: <www.dlib.org>. Although most of the work of the group is planned to be conducted via email and the net, a kickoff meeting is scheduled for 7-8 January 1998 at Stanford University.

