Conference Report


D-Lib Magazine
September 2003

Volume 9 Number 9

ISSN 1082-9873

Report on the 7th European Conference on Digital Libraries, ECDL 2003

August 17-22, Trondheim, Norway


Andreas Rauber
Vienna University of Technology

The 7th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2003) took place in Trondheim, Norway, from August 17-22.

ECDL 2003 attracted 360 participants from 35 countries, who followed a high-quality program at the wonderful conference location in the Hotel Britannia. The program was organized in two parallel sessions during the three main conference days. These were preceded by a day of tutorials on Sunday, with a total of 9 tutorials covering the wide and interdisciplinary scope of digital libraries (DL), addressing topics such as usability evaluation, geo-referencing, indexing and retrieval of audio and video content, thesauri and ontologies, and the CIDOC conceptual reference model.

The competition for the conference program was highly selective, with only 47 papers being accepted for presentation out of a total of 161 submissions (an acceptance rate of only 29%). All papers were reviewed by at least 3 members of the program committee, which were distributed across 26 countries from 5 continents. Furthermore, ECDL 2003 offered 36 posters, as well as 16 demonstrations, selected from a total of 72 submitted.

The opening keynote speech was presented by John Lervik, CEO of Fast Search & Transfer (FAST), which developed the Fast Internet search engine. Lervik analyzed in detail the various components of a modern third-generation Internet search engine. While previous generations of search engines focused solely on the analysis of unstructured text and queries, current systems incorporate sophisticated linguistic and structural analysis, such as the detection of noun phrases, product names, acronyms, place names, compound detection, document types, and others. While these methods increase the relevance of pages retrieved, recall is improved by incorporating orthographic, morphologic, syntactic and semantic analysis, in order to differentiate between different types of queries, which, in turn, are run against different aspects of documents, such as content, anchor text, or document type. Lervik also presented the CASFQ framework for ranking documents, improving precision by analyzing retrieved pages along the orthogonal dimensions of completeness with regard to context and meta-tags, authority based on link cardinality, statistical similarity, quality type of the document, and freshness. On top of these, several modules for result visualization and information discovery support multidimensional navigation through the retrieved data.

The second keynote, presented by Clifford Lynch, Director of the Coalition for Networked Information (CNI), addressed another core topic in the DL domain, namely stewardship in the digital age, i.e., caring for information and cultural heritage, honoring our relationship to history, and preserving culture heritage for the benefit of future generations. Lynch detailed the various aspects of stewardship at three levels—namely the institutional or organizational level, the level of nations and peoples, and each individualís personal level. In doing this, he emphasized the need to guarantee preservation of digital information, pointing out that such preservation need not be done immediately to guarantee access to the digital information for hundreds of years into the future, but needs to focus now on guaranteeing its preservation for the immediate present and near future. Apart from keeping digital objects accessible, Lynch emphasized the importance of ensuring their integrity, as silent data corruption might prove more harmful than potential loss.

Regarding stewardship at the institutional level, Lynch argued strongly for active replication of data, stressing the importance of the large-scale distributed replication under autonomous control in place in current library settings. At the national level, the importance of preservation needs to be recognized, addressing the duplication, swapping, back-up and restoration of cultural collections across national boundaries. Yet, in the face of enormous amounts of data being collected about virtually all individuals, he also calls for some "forgetting", touching on the issue of when the private may become public. At the individual and personal level, the ubiquity of tools such as digital cameras, video, e-mail, and others results in the creation of large private digital libraries that require appropriate care. On the basis of the proliferation of personal digital information, Lynch predicts a huge future market for offering appropriate trusted services for preservation and access provision.

The final keynote of the conference was given by Karen Sparck Jones from Cambridge University, who reviewed the evolution of one of the core technical bases of digital library systems, i.e., information retrieval. Starting from the first works in the 1960's on the analysis of word frequencies for performing retrieval in small-scale text collections, Sparck Jones showed the dramatic progress in this field over the decades, highlighting both the sound theoretical models created for the retrieval processes, as well as the increasingly extensive experimental evidence supporting it. While acknowledging the gains provided by more sophisticated natural language processing techniques for new challenges in this domain, such as question answering systems, automatic text summarization, and others, she especially emphasized the importance and high-level performance offered by the rather simple statistical modeling of text for retrieval purposes. Sparck Jones urged the library domain to more readily embrace the possibilities offered by research in the information retrieval domain, while advising information retrieval researchers to communicate their results in a way that can be more readily appreciated.

Highly controversial topics resulted in intensive discussion during the three scheduled panel sessions. The first panel tried to predict the future of academic publishing, confronting the classical subscription model where the reader pays for access through subscription or licensing models, with the open access model where the authors pay for the publication of their articles.

The second panel session, entitled "Digital? Libraries?" put the very topic of the conference to discussion, questioning the appropriateness of the term "digital libraries" and the expectations such a term carries, in addition to speculating about the need to replace it with a more appropriate term.

Finally, the third panel debated about to what extent metadata was a crucial issue for preservation and which steps needed to be taken in order to arrive—if at all possible—at appropriate metadata standards.

The conference presentations were organized in 13 sessions, which ran in two parallel tracks, covering the wide topical range of the interdisciplinary field of digital libraries by addressing topics such as: user interaction; indexing, classification and retrieval; knowledge organization; architectures and systems; Web technologies and subject gateways; collection-building and management; metadata issues, and digital preservation.

There was a strong focus on user studies, involving questionnaires, traffic logs, and interviews employed to study the behavior of users in digital library systems. Houssem Assadi et al. presented such a study that was based on the Gallica digital library at the French National Library, identifying different user groups and their search as well as usage behavior. Along the same line, Jela Steinerova reported on a questionnaire-based study of users of academic and research libraries in Slovakia, analyzing expectations, search behavior and sources used, while Madle et al. studied the change in knowledge gained by users of a medical information website.

Another focus of the conference was the different aspects of information retrieval and exploration. A paper by Bel et al. reported on a study of the performance of cross-lingual text categorization and compared effects observed on word distribution and translation with those observed in the more traditional cross-language information retrieval. A recommender system based on log-file analysis was presented by Geyer-Schulz et al., while annotations made by law students reading case law documents were used by Shipman et al. to identify relevant passages, studying the different types of free-form annotations made on the paper documents.

Retrieval from XML sources was another point of focus at ECDL 2003. Christopher York et al. have been studying different use cases and the usefulness of structure-aware queries in different settings. Some of the problems raised in the York et al. presentation were directly addressed by Henrik Nottelmann et al. who provided a methodology for uncertain schema mapping using the standard ontology language DAML+OIL in connection with probabilistic Datalog in order to make up for the lack of rules in the standard. Amato et al., on the other hand, borrowed a technique from information retrieval by using a rotated lexicon to support the efficient processing of queries against an XML database containing wildcards both for structure as well as content elements.

Systems for accessing and interacting with music as well as geo-referenced digital libraries were presented in talks by George Tzanetakis, and Matthew Weaver et al., respectively.

In a session devoted to digital preservation, Michael Day presented an overview of the various web preservation initiatives around the globe and the three main strategies taken by those initiatives, i.e., automatic harvesting, selection, and deposit. A paper on the DSpace system was presented by Robert Tansley et al., comparing its architecture with the OAIS reference model. Jane Hunter and Sharmin Choudhury reported on preservation strategies for mixed-media objects using three new media artworks as a case study. While the recommended migration strategy for a VHS-based video artwork posed few problems, only about two-thirds of the content of two multimedia CDROMs for a Macintosh system could be recovered using an emulation approach. For the installation-type work, a multi-layered approach using complex metadata information for the individual elements of the installation was defined.

ECDL 2003 was rounded off by two days of workshops that drew a large number of participants and complemented the main conference program. Four one-day workshops covered in detail the issues of digital library evaluation, networked knowledge organization systems, digital libraries in healthcare, and Web-archiving. The fifth workshop was the two-day CLEF 2003 Workshop of the Cross-Language Evaluation Forum. (See reports on all five workshops in the "In Brief" column of this issue of D-Lib Magazine at <>.)

The ECDL 2003 proceedings were published by Springer Verlag as Lecture Notes in Computer science, LNCS 2769.

Next year's European Conference on Digital Libraries (ECDL 2004) will be held from September 12 - 17 in Bath, United Kingdom. (An announcement will appear in a future issue of D-Lib Magazine when the ECDL 2004 web site has become available for viewing.)


Copyright © Andreas Rauber

DOI: 10.1045/september2003-rauber