The first Digital Curation in the Human Sciences workshop took place in the context of the European Conference of Digital Libraries (ECDL2009) in Corfu, on 30 September and 1 October 2009. Through an intensive sequence of paper presentations and group discussion, it provided the ground for an exciting debate between CESSDA, CLARIN and DARIAH the leading European projects for digital infrastructure in the human sciences and cutting edge perspectives in digital curation, information behaviour and digital humanities research.
Objective and themes
Research in the human sciences is predominantly information-intensive, often idiographic and hermeneutical, and dependent on complex associations of phenomena, differentiated disciplinary languages, and divergent theoretical and methodological perspectives. Large-scale cultural heritage digitisation, together with the explosion of born-digital information on contemporary societies, pose significant new challenges of resource discovery and interoperability, producing a need for interdisciplinary, collaborative research agendas and action plans to tackle issues of long-term digital preservation and adequate knowledge representation of information in a number of scholarly domains.
Digital curation aims to address exactly this pressing need to ensure future epistemic adequacy of information objects on which knowledge in the human sciences depends. It encompasses a set of activities aiming at the production of high quality, dependable digital assets; their organisation, archiving and long-term preservation; and the generation of added value from digital assets by means of resource-based knowledge elicitation. These activities are a match to research processes, or "scholarly primitives", that digital infrastructures for the human sciences purport to support.
The workshop provided a focus for discussion on issues related to the conceptualisation, design, development and functioning of the planned digital research infrastructures for the human sciences, in Europe and beyond, from a digital curation perspective, facilitating the exchange of ideas, best practices and the convergence of future directions between preparatory digital infrastructure projects.
The first session of the workshop, chaired by Costis Dallas (Faculty of Information, University of Toronto, Canada; Panteion University & Digital Curation Unit-IMIS, Athena Research Centre, Greece), focused on a presentation of three leading European e-infrastructure projects for the human sciences. Hilary Beedham (UK Data Archive, University of Essex, United Kingdom) presented "The CESSDA research infrastructure: formalising 30 years' experience"; she noted the need to tackle challenges of comparability of data and availability through a centralised approach, and suggested among others the need to support enhanced access to data, to establish standard access agreements, to allow single sign-on for users, and to ensure long-term digital preservation through Seal of Approval. Martin Wynne (Oxford eResearch Centre, Oxford University, United Kingdom) followed suit with "Preserving Babel: CLARIN and the preservation of language resources", addressing vexing problems of heterogeneity and multiple use of language resources in a fragmented, competitive, fast-changing environment; he proposed a streamlined scenario of the digital humanities process, supporting an architecture of federated archives, and introducing key themes for action including persistent identifiers, component metadata, concept registries, and support for virtual collections. The session was rounded up by Peter Doorn (Data Archiving and Networked Service, The Netherlands) who, further to presenting an overview of current work in DARIAH on preparing the digital infrastructure for the arts and humanities, introduced some broader issues regarding the diversity of researchers' profiles and needs, the distinction between resources/tools and computational analysis methods for scholarship, and the need not only to support tackling old research questions in new ways, but also to allow new questions.
The second session, chaired by Rene van Horik (Data Archiving and Networked Service, The Netherlands), focused on different theoretical and methodological perspectives on "Digital curation and supporting research in the human sciences through digital technologies". Seamus Ross (Faculty of Information, University of Toronto, Canada) provided a fascinating overview of new ideas for digital preservation, raising attention to bias in human science data constitution, preserved software as "cultural artefact" informed by differentiated context and mechanisms of use, a digital continuum approach to conceptualising and specifying infrastructures, and harnessing standards to the minimum required for interoperability so that they do not hamper intellectual progress. Panos Constantopoulos (Digital Curation Unit-IMIS, Athena Research Centre & Department of Informatics, Athens University of Economics and Business, Greece) argued that e-scholarship and the requisite digital infrastructures are best conceptualised by means of a digital curation model allowing for a broader set of lifecycle processes including appraisal, knowledge enhancement and user experience, and complementary processes for context management, such as goal and usage management, and domain modelling; he also presented a complementary scholarly activity model, linking such processes with research goals, concepts and propositions, resources, methods, tools and services, and used for requirements analysis in DARIAH. Peter Buneman (University of Edinburgh & Digital Curation Centre, United Kingdom), drawing from surprising commonalities between human and biomedical sciences, re-affirmed the importance of provenance and context in ensuring information fitness-for-purpose.
In the third session, chaired by Panos Constantopoulos and focusing on "Digital curation, registries, research repositories and digital libraries", Tobias Blanke (Centre for e-Research, King's College, University of London, United Kingdom) took us on a journey "From a collection of tools and services towards a research infrastructure for the arts and humanities" bridging strategic and technical issues faced by infrastructure designers; he identified human factors, apart from platforms and resources, as a component of digital infrastructures, stressed the particular difficulty of defining a humanities infrastructure due to the complexity, inconsistency and incompleteness of the data, and the isolated nature of collections and research communities; he then presented the role of DARIAH and its interoperability layer as a "trusted intermediary", based on an architecture of mediated federation, possibly making use of distributed mappings and user-defined access tools. Rene van Horik then introduced the work of the "Migration to Intermediate XML for Electronic Data (MIXED)" project, based on a simple and practical approach to preservation and interoperability of formatted research data through Standard Data Formats for Preservation (SDFP), encapsulating both data and schema information, and amenable to integration within a research repository workflow.
The fourth and final session, chaired by Seamus Ross, evolved into a full-blown round table involving most participants of the workshop and doing justice to the "Requirements for digital infrastructures in the human sciences: views from the field" topic originally envisaged. Elaine Toms (Centre for Management Informatics, Dalhousie University, Canada), based on her experience in information behaviour research, introduced an information environment model that encompasses researchers, their expectations, data and systems, and argued that scholarly information needs and expectations should be closely coupled with digital infrastructure design. Eric Peter Haswell (Institute of Nordic Research, University of Copenhagen, Denmark) advised caution on the ability of current technology to fully replicate scholarly research practice and argued for an open-ended, accommodative approach. Costis Dallas argued in favour of the relevance of interdisciplinary and qualitative approaches to understand actual scholarly practice in the humanities, and also for prospective foresight in order to develop specifications for emergent requirements; he suggested that, apart from interoperability for information discovery and search, digital infrastructures should allow for rich representation, annotation and associative linking of resources, i.e., curation processes, and should be able to deal equally with primary and secondary resources and scholarly objects.
In a series of fast-paced, one-minute statements, further workshop participants, including Christos Papatheodorou (Digital Curation Unit-IMIS, Athena Research Centre & Department of Library and Archival Science, Ionian University, Greece), Polina Proutskova (Goldsmiths College, University of London, UK), Laurent Romary (INRIA, France & Humbolt University Berlin), Marc van den Berg (University of Amsterdam, The Netherlands), and Rudi Schmiede (Institute of Sociology, Darmstadt Technical University, Germany), engaged in a fascinating debate on several additional issues, including the need to formally bridge information behaviour research with systems specification; to accommodate the needs of musicology research in planned digital infrastructures; to establish collaboration, adding requisite mechanisms for epistemic trust and authority, with generic cultural heritage digital libraries such as Europeana; to establish commonly agreed definitions of much-needed standards for interoperability, and scholarly primitives; and to heed the importance of the methodological heterogeneity amongst scholarly "communities of practice" on digital infrastructure requirements.
The workshop was an opportunity to bring together representatives of major digital infrastructure preparatory projects with computer scientists, information researchers and digital humanists. Papers presented, and debate amongst participants, highlighted problems some of which go back to centuries of scholarship, while others (to quote Seamus Ross) "do not exist yet". There was consensus on the need to tackle both immediate and medium-term challenges, further bridging the gap between scholarship and systems. The dialogue between stakeholders will surely go on, as digital infrastructure initiatives move from the preparatory to the implementation stage.
Digital infrastructure project links
CESSDA - Council of European Social Science Data Archives: http://www.cessda.org
CLARIN Common Language Resources and Technology Infrastructure: http://www.clarin.eu
DARIAH - Digital Research Infrastructure for the Arts and Humanities: http://www.dariah.eu
Copyright © 2009 Costis Dallas and Peter Doorn