D-Lib Magazine
March 1997

ISSN 1082-9873

Clips and Pointers

DELOS Cross-Language Workshop: Summary

Paraic Sheridan, ETH-Zurich

The third workshop of the DELOS working group, on the topic of "Cross-Language Information Retrieval", was hosted by ETH Zurich, 5-7 March 1997. DELOS is a working group funded by the IT Long Term Research programme of the European Commission to study and investigate existing and emerging technologies and issues relevant to digital libraries.

The DELOS Working Group is just one of a series of ERCIM-sponsored initiatives aimed at promoting research and operational activities in the Digital Library field. The DELOS consortium consists mainly of members of ERCIM institutes (ERCIM: European Research Consortium for Informatics and Mathematics).

As was borne out by many of the workshop presentations, many research projects addressing issues of digital information repositories in Europe must deal with information in several languages, even when multi-lingual or cross-language information retrieval is not a central theme of the project. We distinguish "multi-lingual" information retrieval as involving several languages, though a user's search query is always evaluated against only those documents in the query language, and "cross-language" information retrieval as the case where a user's query may retrieve documents in languages other than the language of the query.

A total of 27 participants attended the workshop, representing 9 different European countries, as well as invited speakers from the United States and Korea, who helped to broaden the discussions beyond the European perspective. Apart from the geographical diversity of the participants, backgrounds in Information Retrieval, Computational Linguistics, Lexicography, Controlled Vocabulary Thesauri, and Internet Technology, also helped to bring many different perspectives to the discussions of the work presented.

To set the scene for the workshop, Doug Oard of the University of Maryland gave a comprehensive overview of Cross-Language Information Retrieval in the USA, including a useful schematic breakdown of the various approaches: corpus-based (parallel, comparable or unaligned corpora) versus knowledge based (dictionaries or ontologies). He presented a substantial amount of US-based research on cross-language retrieval, and showed that current approaches have demonstrated performance in the range of 50% to 75% of the performance of the comparable monolingual retrieval task. This presentation was followed by Sung-Huyn Myaeng of the National University Taejon, Korea, who gave an indepth presentation of the particular problems of working with Asian languages, including the use of different scripts, the problem of word segmentation and the similar problem of compound noun analysis. This was appropriately followed by Martin Duerst, University of Zurich, who, in recognition of the increasing role of the World Wide Web in this area of research, detailed the emerging HTTP and HTML standards for supporting multi-script and multi-language information on the World Wide Web.

Other presentations from European researchers focussed on the approaches being adopted for cross-language and multi-language retrieval in various projects such as Twenty-One, MULINEX, Acquarelle, ILIAD and MedExplore, some of which are funded by the European Commission. A common sentiment expressed was that, even in cases where multilinguality was not a core concern of the project consortia, it was a topic that had to be addressed given the European dimension. We therefore saw some novel approaches to cross-language retrieval being taken by these researchers. An important parallel theme was also the identification, conflation and use of multi-word terms for cross-language retrieval, given the observation that these can serve to greatly reduce translation ambiguities.

From the Information Retrieval point of view, David Hull of Rank Xerox research centre, Grenoble, presented a model for weighted Boolean retrieval for cross-language retrieval, and Paraic Sheridan of ETH Zurich presented a method of using a retrieval model for building information structures called "similarity thesauri" for cross-language retrieval. The presentation of similarity thesauri showed how this approach has been implemented also for cross-language retrieval of speech documents, and a demonstration of the EuroSpider retrieval system was given. Approaches from the Computational Linguistics perspective were presented by Carol Peters of CNR Italy, who showed how the use of comparable corpora together with lexical resources could bring to light useful translation equivalences for cross-langauge retrieval, and Piek Vossen of the University of Amsterdam presented the EuroWordnet project which is augmenting the Princeton Wordnet of English with wordnets in Dutch, Italian and Spanish. The workshop concluded with a discussion of the important issue of evaluating different approaches to cross-language information retrieval, and the fact that this year's Text Retrieval Conference (TREC 6) will include a track evaluating cross-language retrieval was highlighted as highly significant.

Further information on this workshop, including a list of participants and abstracts of presentations, can be found at:

http://www-ir.inf.ethz.ch/DELOS/

The next DELOS workshop will address "Multi-Media Indexing and Retrieval", and will take place in Pisa Italy, August 29th and 30th, in conjunction with the First European Conference on Research and Advanced Technology for Digital Libraries.

For additional information, please contact:

The DELOS Working Group Coordinator
Constantino Thanos
Instituto di Elaborazione della Informazione
Consiglio Nazionale delle Ricerche
Tel +39 50 593429
Email: [email protected]

GABRIEL Launched as Official Service, 1 January 1997

During the September 1996 meeting of the Conference of European National Librarians (CENL), the members decided that Gabriel, Gateway to Europe's National Libraries, a popular pilot service established in 1995 jointly by the British Library and the national libraries of Finland and The Netherlands, should be launched as an official service on behalf of Europe's national libraries. The service intends to achieve comprehensive coverage of European national libraries and has the following major objectives:

to provide information on the World Wide Web about national libraries in a uniform way in several languages;

to provide convenient online links to sources of information about their services and collections;

to give access to all their online services where appropriate;

to be a bulletin board with news items about the national libraries;

to provide access to all the World Wide Web (WWW) servers of the national libraries through a single search service; and

to build collaborative links between European national libraries in the networking field.

The service is supervised and maintained by a board and a team, representing eight nationallibraries. Overall maintenance is the responsibility of the Netherlands' national library, the Koninklijke Bibliotheek. In 1997, new functionality will be added to the service: new services of the individual libraries will be added and the results of collaborative projects will be made available on the web. Gabriel can be accessed at four sites:

http://www.konbib.nl/gabriel/
http://portico.bl.uk/gabriel/
http://renki.helsinki.fi/gabriel/
http://www.ddb.de/gabriel/

In Print

Oregon State System of Higher Education Historical and Cultural Atlas Resource is a joint project of the University of Oregon History Department, Geography Department Infographics Lab, and New Media Center, which uses Macromedia Shockwave to create an interactive atlas of culture and history. The project was conceived in support of campus-wide research and educational activities but is presently accessible via the web.

Evaluating Web Sites for Educational Uses: Bibliography Checklist, Carolyn M. Kotlas, comp., updated February 13, 1997. In addition to identifying a number of on-line and print resources, this page provides a useful checklist to enable users to evaluate sites for themselves.

Goings On

The "Digital Documents in Context: Organization and Creation" track of the Thirty-First Annual Hawaii International Conference on Systems Sciences (HICSS) will address issues associated with creating and using digital documents in business and scholarship. The call for papers closes March 15, 1997. The conference will be held at The Orchid at Mauna Lani, Kohala Coast, Hawaii, January 6-9, 1998.

Workshop on Research Directions for the Next Generation Internet, Vienna, Virginia, May 13-14, 1997. The Computing Research Association (CRA) has organized a two-day conference to discuss the research agenda needed to accomplish the goals of the Next Generation Internet (NGI), a three-year, $300 million initiative announced by President Clinton and Vice President Gore on October 10, 1996. The call for white papers, which will provide background for the working sessions and will become part of the formal record, closes on March 27, 1997. Attendance will be by invitation only. The NGI Workshop web site is currently under construction.
The workshop is sponsored by the Federal Large Scale Networking Working Group (LSN) of the National Science and Technology Council's Committee on Computing, Information, and Communications R&D Subcommittee. LSN members include the National Institutes of Health, National Security Agency, Department of Energy, National Aeronautics and Space Administration, Department of Defense, DARPA, National Coordinating Office, National Oceanic and Atmospheric Administration, White House Office of Science and Technology Policy, Federal Networking Council, and National Science Foundation.

The call for papers for the Twenty-fifth Annual Telecommunications Policy Research Conference will close on March 28, 1997. Subject areas include but are not limited to telecommunications economics, convergence and competition, international issues, telecommunications and society, intellectual property, security, and policy. Further information is available at <http://www.si.umich.edu/~prie/tprc>.

Networking '97: Exploring the ContinuedEvolution of Internet Technology for Research and Education, Washington, DC, April 9-10, 1997. This conference is sponsored by Coalition for Networked Information (CNI), Computing Research Association (CRA), Educom, and FARNET, and will address a number of current issues including but not limited to updates on Internet II, Federal regulatory policy, and Next Generation Initiatives.

The International Association for Social Science Information Service and Technology (IASSIST)/International Federation of Data Organizations (IFDO) annual conference will be held in Odense, Denmark, May 6-9, 1997. The theme of the conference is "Data Frontiers in the Infoscape"; papers have been solicited in the areas of metadata, imaging, policy, archives, Internet search engines, and thesauri and subject access. The session schedules have not yet been posted, but there is preliminary information on registration and related information at the conference's web site.

IEEE ADL '97: International Conference on Advances in Digital Libraries, Washington, DC, May 7-9, 1997. Advance program and registration information are now available for this day conference, to be held at the Library of Congress. Sessions will address images, biomedicine, intellectual property, roles and contributions of the professional societies, and future directions.

The 8th Joint European Networking Conference (JENC8), Edinburgh, Scotland, May 12-15, 1997, will address "Diversity and Integration: The New European Networking Landscape". Given widespread interest in the conference and additional call for short papers was issued, which closes March 16, 1997. The conference is organized by TERENA, the Trans-European Research and Education Networking Association, and hosted by TERENA, with local assistance from the University of Edinburgh and Concorde Services, Limited. The full program and registration information is available at the conference's web site.

Human-Computer Interaction Laboratory, University of Maryland Institute of Advanced Computer Studies, 14th Annual Symposium and Open House, College Park, May 30, 1997. The theme of this year's symposium and open house is universal access with sessions on user interface design: digital visual libraries, and learning tools, followed by demonstrations. Registration by May 20, 1997 is requested.

American Society for Information Science (ASIS) 1997 Mid-Year Meeting: Information Privacy, Security, and Data Integrity, Scottsdale, Arizona, May 30 - June 3, 1997. The current program is now available. Pre-meeting courses begin on Saturday. May 31; formal sessions open Monday, June 2. Up-to-date information on session descriptions, schedule, registration, and related issues will be posted to the conference's web site, which offers on-line registration.

The second International Summer School on the Digital Library will be held at Tilburg University, the Netherlands, August 10- 22, 1997. The summer school is sponsored by the Tilburg Innovation Centre for Electronic Resources ("Ticer") in cooperation with Tilburg University and Elsevier Science. Information may also be requested via electronic mail: [email protected].

Pointers in This Column:

8th Joint European Networking Conference (JENC8), Edinburgh, Scotland, May 12-15, 1997 http://www.terena.nl/conf/JENC8.html
American Society for Information Science (ASIS) 1997 Mid-Year Meeting: Information Privacy, Security, and Data Integrity, Scottsdale, Arizona, May 30 - June 3, 1997 http://www.asis.org/midyear97/program.html
"Digital Documents in Context: Organization and Creation", Thirty-First Annual Hawaii International Conference on Systems Sciences (HICSS) http://www.cba.hawaii.edu/hicss
Evaluating Web Sites for Educational Uses: Bibliography Checklist, Carolyn M. Kotlas, February 13, 1997 http://www.iat.unc.edu/guides/irg-49.html
Gabriel, Gateway to Europe's National Libraries http://www.konbib.nl/gabriel/
http://portico.bl.uk/gabriel/
http://renki.helsinki.fi/gabriel/
http://www.ddb.de/gabriel/
Human-Computer Interaction Laboratory, University of Maryland Institute of Advanced Computer Studies,
14th Annual Symposium and Open House
College Park
May 30, 1997 http://www.cs.umd.edu/projects/hcil/
IEEE ADL '97: International Conference on Advances in Digital Libraries
Washington, DC, May 7-9, 1997 http://cesdis.gsfc.nasa.gov/admin/adl97/adlcall.html
International Association for Social Science Information Service and Technology (IASSIST)/International Federation of Data Organizations (IFDO) Annual Conference, Odense, Denmark, May 6-9, 1997 http://www.sa.dk/dda/conf97
International Summer School on the Digital Library, Tilburg University, the Netherlands, August 10- 22, 1997 http://cwis.kub.nl/~ticer/
Networking '97: Exploring the Continued Evolution of Internet Technology for Research and Education, Washington, DC, April 9-10, 1997 http://www.educom.edu/web/nttf/net97.html

Oregon State System of Higher Education Historical and Cultural Atlas Resource http://darkwing.uoregon.edu/~atlas/
Twenty-fifth Annual Telecommunications Policy Research Conference http://www.si.umich.edu/~prie/tprc

hdl:cnri.dlib/march97-clips

EuroSpider	http://www.eurospider.ch/
EuroWordNet	http://www.let.uva.nl/~ewn/
MULINEX	http://www2.echo.lu/langeng/en/le3/mulinex/mulinex.html
Twenty-One	http://www2.echo.lu/ie/en/twentyone.html

8th Joint European Networking Conference (JENC8), Edinburgh, Scotland, May 12-15, 1997	http://www.terena.nl/conf/JENC8.html
American Society for Information Science (ASIS) 1997 Mid-Year Meeting: Information Privacy, Security, and Data Integrity, Scottsdale, Arizona, May 30 - June 3, 1997	http://www.asis.org/midyear97/program.html
"Digital Documents in Context: Organization and Creation", Thirty-First Annual Hawaii International Conference on Systems Sciences (HICSS)	http://www.cba.hawaii.edu/hicss
Evaluating Web Sites for Educational Uses: Bibliography Checklist, Carolyn M. Kotlas, February 13, 1997	http://www.iat.unc.edu/guides/irg-49.html
Gabriel, Gateway to Europe's National Libraries	http://www.konbib.nl/gabriel/ http://portico.bl.uk/gabriel/ http://renki.helsinki.fi/gabriel/ http://www.ddb.de/gabriel/
Human-Computer Interaction Laboratory, University of Maryland Institute of Advanced Computer Studies, 14th Annual Symposium and Open House College Park May 30, 1997	http://www.cs.umd.edu/projects/hcil/
IEEE ADL '97: International Conference on Advances in Digital Libraries Washington, DC, May 7-9, 1997	http://cesdis.gsfc.nasa.gov/admin/adl97/adlcall.html
International Association for Social Science Information Service and Technology (IASSIST)/International Federation of Data Organizations (IFDO) Annual Conference, Odense, Denmark, May 6-9, 1997	http://www.sa.dk/dda/conf97
International Summer School on the Digital Library, Tilburg University, the Netherlands, August 10- 22, 1997	http://cwis.kub.nl/~ticer/
Networking '97: Exploring the Continued Evolution of Internet Technology for Research and Education, Washington, DC, April 9-10, 1997	http://www.educom.edu/web/nttf/net97.html
Oregon State System of Higher Education Historical and Cultural Atlas Resource	http://darkwing.uoregon.edu/~atlas/
Twenty-fifth Annual Telecommunications Policy Research Conference	http://www.si.umich.edu/~prie/tprc

D-Lib Magazine March 1997