D-Lib Magazine
March 1997

ISSN 1082-9873

Clips and Pointers

DELOS Cross-Language Workshop: Summary

Paraic Sheridan, ETH-Zurich

The third workshop of the DELOS working group, on the topic of "Cross-Language Information Retrieval", was hosted by ETH Zurich, 5-7 March 1997. DELOS is a working group funded by the IT Long Term Research programme of the European Commission to study and investigate existing and emerging technologies and issues relevant to digital libraries.

The DELOS Working Group is just one of a series of ERCIM-sponsored initiatives aimed at promoting research and operational activities in the Digital Library field. The DELOS consortium consists mainly of members of ERCIM institutes (ERCIM: European Research Consortium for Informatics and Mathematics).

As was borne out by many of the workshop presentations, many research projects addressing issues of digital information repositories in Europe must deal with information in several languages, even when multi-lingual or cross-language information retrieval is not a central theme of the project. We distinguish "multi-lingual" information retrieval as involving several languages, though a user's search query is always evaluated against only those documents in the query language, and "cross-language" information retrieval as the case where a user's query may retrieve documents in languages other than the language of the query.

A total of 27 participants attended the workshop, representing 9 different European countries, as well as invited speakers from the United States and Korea, who helped to broaden the discussions beyond the European perspective. Apart from the geographical diversity of the participants, backgrounds in Information Retrieval, Computational Linguistics, Lexicography, Controlled Vocabulary Thesauri, and Internet Technology, also helped to bring many different perspectives to the discussions of the work presented.

To set the scene for the workshop, Doug Oard of the University of Maryland gave a comprehensive overview of Cross-Language Information Retrieval in the USA, including a useful schematic breakdown of the various approaches: corpus-based (parallel, comparable or unaligned corpora) versus knowledge based (dictionaries or ontologies). He presented a substantial amount of US-based research on cross-language retrieval, and showed that current approaches have demonstrated performance in the range of 50% to 75% of the performance of the comparable monolingual retrieval task. This presentation was followed by Sung-Huyn Myaeng of the National University Taejon, Korea, who gave an indepth presentation of the particular problems of working with Asian languages, including the use of different scripts, the problem of word segmentation and the similar problem of compound noun analysis. This was appropriately followed by Martin Duerst, University of Zurich, who, in recognition of the increasing role of the World Wide Web in this area of research, detailed the emerging HTTP and HTML standards for supporting multi-script and multi-language information on the World Wide Web.

Other presentations from European researchers focussed on the approaches being adopted for cross-language and multi-language retrieval in various projects such as Twenty-One, MULINEX, Acquarelle, ILIAD and MedExplore, some of which are funded by the European Commission. A common sentiment expressed was that, even in cases where multilinguality was not a core concern of the project consortia, it was a topic that had to be addressed given the European dimension. We therefore saw some novel approaches to cross-language retrieval being taken by these researchers. An important parallel theme was also the identification, conflation and use of multi-word terms for cross-language retrieval, given the observation that these can serve to greatly reduce translation ambiguities.

From the Information Retrieval point of view, David Hull of Rank Xerox research centre, Grenoble, presented a model for weighted Boolean retrieval for cross-language retrieval, and Paraic Sheridan of ETH Zurich presented a method of using a retrieval model for building information structures called "similarity thesauri" for cross-language retrieval. The presentation of similarity thesauri showed how this approach has been implemented also for cross-language retrieval of speech documents, and a demonstration of the EuroSpider retrieval system was given. Approaches from the Computational Linguistics perspective were presented by Carol Peters of CNR Italy, who showed how the use of comparable corpora together with lexical resources could bring to light useful translation equivalences for cross-langauge retrieval, and Piek Vossen of the University of Amsterdam presented the EuroWordnet project which is augmenting the Princeton Wordnet of English with wordnets in Dutch, Italian and Spanish. The workshop concluded with a discussion of the important issue of evaluating different approaches to cross-language information retrieval, and the fact that this year's Text Retrieval Conference (TREC 6) will include a track evaluating cross-language retrieval was highlighted as highly significant.

Further information on this workshop, including a list of participants and abstracts of presentations, can be found at:

http://www-ir.inf.ethz.ch/DELOS/

The next DELOS workshop will address "Multi-Media Indexing and Retrieval", and will take place in Pisa Italy, August 29th and 30th, in conjunction with the First European Conference on Research and Advanced Technology for Digital Libraries.

For additional information, please contact:

The DELOS Working Group Coordinator
Constantino Thanos
Instituto di Elaborazione della Informazione
Consiglio Nazionale delle Ricerche
Tel +39 50 593429
Email: [email protected]

Related URL's:
EuroSpider http://www.eurospider.ch/
EuroWordNet http://www.let.uva.nl/~ewn/
MULINEX http://www2.echo.lu/langeng/en/le3/mulinex/mulinex.html
Twenty-One http://www2.echo.lu/ie/en/twentyone.html


GABRIEL Launched as Official Service, 1 January 1997

During the September 1996 meeting of the Conference of European National Librarians (CENL), the members decided that Gabriel, Gateway to Europe's National Libraries, a popular pilot service established in 1995 jointly by the British Library and the national libraries of Finland and The Netherlands, should be launched as an official service on behalf of Europe's national libraries. The service intends to achieve comprehensive coverage of European national libraries and has the following major objectives:
  • to provide information on the World Wide Web about national libraries in a uniform way in several languages;
  • to provide convenient online links to sources of information about their services and collections;
  • to give access to all their online services where appropriate;
  • to be a bulletin board with news items about the national libraries;
  • to provide access to all the World Wide Web (WWW) servers of the national libraries through a single search service; and
  • to build collaborative links between European national libraries in the networking field.
  • The service is supervised and maintained by a board and a team, representing eight nationallibraries. Overall maintenance is the responsibility of the Netherlands' national library, the Koninklijke Bibliotheek. In 1997, new functionality will be added to the service: new services of the individual libraries will be added and the results of collaborative projects will be made available on the web. Gabriel can be accessed at four sites:
    http://www.konbib.nl/gabriel/
    http://portico.bl.uk/gabriel/
    http://renki.helsinki.fi/gabriel/
    http://www.ddb.de/gabriel/


    In Print


    Goings On


    Pointers in This Column:

    8th Joint European Networking Conference (JENC8), Edinburgh, Scotland, May 12-15, 1997http://www.terena.nl/conf/JENC8.html
    American Society for Information Science (ASIS) 1997 Mid-Year Meeting: Information Privacy, Security, and Data Integrity, Scottsdale, Arizona, May 30 - June 3, 1997 http://www.asis.org/midyear97/program.html
    "Digital Documents in Context: Organization and Creation", Thirty-First Annual Hawaii International Conference on Systems Sciences (HICSS)http://www.cba.hawaii.edu/hicss
    Evaluating Web Sites for Educational Uses: Bibliography Checklist, Carolyn M. Kotlas, February 13, 1997http://www.iat.unc.edu/guides/irg-49.html
    Gabriel, Gateway to Europe's National Libraries http://www.konbib.nl/gabriel/
    http://portico.bl.uk/gabriel/
    http://renki.helsinki.fi/gabriel/
    http://www.ddb.de/gabriel/
    Human-Computer Interaction Laboratory, University of Maryland Institute of Advanced Computer Studies,
    14th Annual Symposium and Open House
    College Park
    May 30, 1997
    http://www.cs.umd.edu/projects/hcil/
    IEEE ADL '97: International Conference on Advances in Digital Libraries
    Washington, DC, May 7-9, 1997
    http://cesdis.gsfc.nasa.gov/admin/adl97/adlcall.html
    International Association for Social Science Information Service and Technology (IASSIST)/International Federation of Data Organizations (IFDO) Annual Conference, Odense, Denmark, May 6-9, 1997http://www.sa.dk/dda/conf97
    International Summer School on the Digital Library, Tilburg University, the Netherlands, August 10- 22, 1997 http://cwis.kub.nl/~ticer/
    Networking '97: Exploring the Continued Evolution of Internet Technology for Research and Education, Washington, DC, April 9-10, 1997 http://www.educom.edu/web/nttf/net97.html
    Oregon State System of Higher Education Historical and Cultural Atlas Resource http://darkwing.uoregon.edu/~atlas/
    Twenty-fifth Annual Telecommunications Policy Research Conference http://www.si.umich.edu/~prie/tprc

    Copyright © 1997 Corporation for National Research Initiatives

    D-Lib Magazine |  Current Issue |  Comments
    Previous Story

    hdl:cnri.dlib/march97-clips