The Arts and Humanities Data Service
Three Years’ On
Director, Arts and Humanities Data Service
King's College London, Library
Strand, London WC2R 2LS
As the Arts and Humanities Data Service (AHDS) reaches the end of its first funding cycle, it is taking stock of its aims, organisations, and achievements. It is also evaluating possible trajectories which may see the service develop from a testbed project into a sustainable service. As part of that process, the AHDS has commissioned a formal evaluation, now partially complete (1). The following article is based on the evaluation’s interim report. It provides a brief description of the AHDS’s organisation and aims, and looks in more depth at two areas in which the AHDS has been particularly active: collections development and resource discovery. It concludes with an outline of the principal challenges which confront the Service in the next 12 to 18 months.
The AHDS was established in 1995 in response to several consultation exercises which touched in whole or in part on the digital information and Information Technology requirements of arts and humanities scholars in the UK (2). Funded by the Joint Information Systems Committee of the UK’s Higher Education Funding Council, the AHDS collects, catalogues, preserves, and encourages re-use of digital resources which result from or support research and teaching in the arts and humanities (3). To achieve these aims, it has adopted a distributed organisation comprising a managing Executive and five Service Providers offering archival and other functions to the disciplines of archaeology, history, performing arts, textual studies, and the visual arts. (See Figure 1 below.)
Figure 1. The top level web page for the AHDS web site.
The AHDS Executive is located at King’s College London and combines managerial and administrative functions with brokering and applied research roles. As a broker, it assists the Service Providers in their collections development, and advises extensively within Higher Education, library, museums, and other sectors on the development and accessibility of other high-quality content of interest to the arts and humanities. In applied research, it brings expertise in digital collections development and management to bear in large-scale initiatives and on policy decisions which affect the arts and humanities scholarly community.
The AHDS Service Providers perform archival and other functions for particular arts communities and take primary responsibility for promoting standards and best practice in the creation, description, preservation and use of particular kinds of data. Each of the Services benefits from advisory and/or management groups which comprise representatives from appropriate scholarly, library, heritage, archive, funding, and specialist organisations. Through these groups, the AHDS is able to disseminate information about its holdings, services, publications, and activities to the widest possible community. It also penetrates and mobilises interdependent but separable communities, bringing their attention to bear on research and strategic issues of common concern. The strength of the advisory/management board structure is apparent in the service's collection development activities, in its outreach, and in its research and consultancy work. Indeed, it underpins the several successes that the AHDS has had in its short life. A summary of specific Service aims and their collections development progress is provided below.
The AHDS Service Providers
The Archaeological Data Service (ADS) is a consortial initiative located at the University of York(4). The Service's orientation reflects archaeologists' extensive and pervasive reliance upon computer techniques; the existence of numerous local, regional, and national agencies which share an interest in developing and maintaining Britain's archaeological record; and the fact that where that record is developed through excavation, it results in the destruction of the primary evidence upon which archaeological research is based. The ADS accordingly works in extensive collaboration with existing archaeological agencies both inside and outside of higher education in order to ensure that the computer-based British archaeological record is at once accessible for secondary use and preserved over the longer term. It focuses its collections effort on digital resources created during archaeological research and works closely with national and local archaeological agencies and those research councils involved in the funding of archaeological research, to negotiate deposition of project data. For those classes of archaeological data where there are existing archival bodies, the ADS collaborates to promote integrated access to and greater use of existing services.
The ADS's significant progress developing its collections testifies to its having filled a crucial need within the archaeological community. It provides integrated online access to nationally significant and distributed archaeological data resources including: a large proportion of the Scottish National Monument and Site Record; the Royal Commission for the Historical Monuments of England's (RCHME) Excavation Index for England; the RCHME's Microfilm Index; the Council for British Archaeology's Carbon-14 database; and the Society of Antiquaries' library catalogue. It also accessions and manages high-quality archaeological data resources which fall outside the collections remit of other heritage organisations which share responsibility for Britain’s archaeological record. Actual deposits extend to over 300 data resources and there are 40 or 50 at any one time scheduled or in preparation for accession, with a further backlog of nearly 800 data resources waiting to be accessioned.
The History Data Service (HDS) is located within the Data Archive at Essex University, and its establishment as a distinctive department of the Archive antedates that of the AHDS by some five years (5). It delivers a professional and user-friendly service through focused acquisitions by cataloguing and indexing all datasets, by validating and documenting all datasets to a high standard, by providing long-term storage, by disseminating data materials in a range of formats and on a range of media, and by developing value-added services where appropriate. It inter-operates with the wider community to set standards and to ensure that it is at the forefront of new methodology and new developments. The HDS also plays an important instructional role within the community advising on standards and best practice as well as promoting awareness about the scholarly advantages that may be derived from using digital resources in research and teaching.
The HDS holds nearly 500 data resources to which it regularly adds some 30-40 new ones each year, and has an accessioning backlog of nearly 100 data resources (6). The most significant new accessions include the 1881 Census for Great Britain (containing some 30 million person records) and the Irish Historical Statistics database. Data are distributed by arrangement on a variety of portable media or by ftp. In addition, the HDS has in the past two years developed a number of major online collections including the Great Britain Historical Database which integrates selected nineteenth-century census and other holdings, and a related GIS which is available through UKborders in Edinburgh. Through these two geospatial data resources, where funding permits, the HDS makes selected high-demand resources available online.
The Oxford Text Archive (OTA) is located within the Oxford University Computing Service where it has resided for some 20 years. It has traditionally operated with a strictly limited budget and a very broad collections policy. It has archived electronic texts of interest not only to literary textual scholars, but also those working in linguistics, history, law, modern and ancient languages, indeed almost any humanities discipline which relies upon a close reading of texts. Its relatively open acquisitions policy has also produced a collection which varies substantially in terms of individual items’ quality, integrity, and re-usability. Under the auspices of the AHDS, the OTA has focused its collections development on the needs of those working in the literary and linguistic disciplines, whilst continuing to take materials from any literary genre, period, or language. With the spread of materials by the so-called canonical authors (e.g. Shakespeare, Milton, Chaucer, etc.), available from both the public domain and commercial enterprises (e.g., Chadwyck-Healey's CD-ROM, Shakespeare's Editions and Adaptations), the OTA collections effort has been further refined as it turns its attention towards the collection and preservation of electronic text materials that lie slightly outside the traditional mainstream of literary and linguistic studies, but which are vital to the work of current and future scholars. The OTA also places a considerable emphasis on providing user-friendly online access to the numerous unrestricted items in its collection and on training scholars in standards and best practices appropriate to the creation and scholarly use of electronic texts.
The OTA’s collections extend to some 3,500 electronic texts and linguistic corpora including electronic versions of literary works by many major authors in Greek, Latin, English and a dozen other languages; collections and corpora of unpublished materials prepared by field workers in linguistics; electronic versions of some standard reference works; and copies of texts and corpora prepared by individual scholars and major research projects from around the world (7). The OTA has also taken advantage of relatively secure AHDS funding to professionally catalogue and, where necessary, reformat existing holdings to ensure their proper management and wider accessibility. It has also developed world-class facilities enabling users to conduct innovative and analytical online searches across texts which are available without copyright or other restriction. The OTA has also been a major supplier of a number of large-scale digital libraries, electronic text archives, and commercial data resources including the Centre for Electronic Texts at the University of Virginia, the Humanities Text Centre at the University of Michigan, and the Master Index of Chadwyck-Healey's Literature OnLine database.
The Performing Arts Data Service (PADS) is located at the University of Glasgow and represents a collaboration between two academic departments: (Music, and Theatre Film and Television Studies, respectively), an arts computing unit, and a university computing centre. It focuses on collecting and promoting the use of digital resources to support research and teaching across the broad field of the performing arts: music, film, broadcast arts, theatre, and dance. Given the relative immaturity of these disciplines with regard to their creation and scholarly use of digital materials, the relative technical sophistication of appropriate computer applications in the discipline, and the copyright restrictions which impede access to so many important collections, the PADS has focused its efforts in assessing users’ as pertaining to their use of digital resources and information technologies; raising awareness about the scholarly benefits that may be derived from creating and using digital resources; researching and promoting standards and best practice in the creation and use of performing arts data; and developing online data resources in two areas - music and film studies - which demonstrate the scholarly benefits from such resources and also the importance of standards and best practices.
The PADS collections reflect its intention to act as a catalyst to innovative research and teaching within the performing arts. Holdings extend to a pilot online service which delivers some 30 hours of film and video sourced from the holdings of the British Film Institute; an online demonstrator catalogue of 1,000 items available from the Scottish Music Information Centre (all with multimedia elements in the form of sound clips and images of scores); and a database of 300 high-quality Internet resources of interest to the performing arts. The PADS is also working with the Scottish National Film and Video Archive to provide networked access to the Archive's catalogue, and with the nine UK Music Conservatoires to integrate access to their online collections catalogues.
The Visual Arts Data Service (VADS). The VADS is another consortial organisation headed by the Surrey Institute of Art & Design (8). Like the PADS, it works within a community whose exploitation of digital resources and computer technologies has been slow to take off owing to the expense and technical sophistication of appropriate applications and the barriers thrown up by copyright restrictions. Accordingly, the VADS also focuses its efforts on identifying users' information requirements; on raising awareness about the scholarly advantages that may accrue by creating and using digital resources; and on documenting and promoting appropriate standards and best practices.
One crucial difference is that more high-quality digital collections exist within the visual arts, many of them arising from the heritage sector, than is the case for the performing arts. Accordingly, the VADS has a collections policy which emphasises accession of data resources which have no other archival home and integrating online access to extant data resources which are served by other agencies. To achieve these complementary collections aims, the VADS works closely with the visual arts, museums and cultural heritage communities, providing an extensive advisory service with regard to the development of integratable distributed visual arts collections.
The VADS's collections reflect both the relative immaturity of the Service, the breadth of academic disciplines it encompasses, and the contexts in which it works, and show significant promise in three areas. Firstly, the VADS is accessioning a small number of major visual arts resources which result from digitisation work within the heritage and scholarly sectors (9). Secondly, it is developing contacts amongst arts, design, and architectural colleges and encouraging deposit of computer-based work which features in the degree shows that are required of fine arts students. This aspect of the VADS's work has only just begun but shows promise and may well develop into an historically significant collection which documents early artistic innovation with computer-based media. Finally, its advisory work will result in integrated access to significant digital image collections which are managed across a range of sites.
Collections Development at the AHDS
Collections development has been a significant challenge for the AHDS and has been met in five ways. Firstly, the development of an integrated collections policy which ensures quality, consistency, and interoperability across the AHDS’s extensively distributed holdings has been crucial (10). The collections policy reflects our understanding about how the prospects for and the costs involved in maintaining access to digital resources over the longer term rest heavily upon decisions taken about those resources at different stages of their life cycle. Decisions taken in the design and creation of a digital resource, and those taken when a digital resource is accessioned into collection, are particularly influential (11). The policy’s very structure parallels the digital resource’s life cycle from the time of its accession, through to its long-term management, dissemination, and, indeed, destruction (or de-accession). It emphasises a standards-based approach to data management and dissemination, and documents how that approach is applied to the numerous different classes of data resources which are available in the AHDS’s collections. This emphasis on standards and best practices commits the AHDS to an aggressively evangelical role amongst would-be data depositors, and underpins two publications series respectively offering guidance to best practice in the creation and use of digital resource, and guidance in the development and management of digital collections. Its development and application in a real service environment has also secured for the AHDS a role in the development of data policies for UK policy-making and funding bodies which will in future exercise some significant influence over the scholarly information landscape.
Secondly, the development of a comprehensive rights management framework has been as important as a collections policy (12). The AHDS’s collections rely almost entirely on voluntary offers of deposit, and accordingly, on the AHDS’s willingness and ability to protect and abide by any intellectual and other rights which may be vested in such resources. A uniform rights management framework which comprises a series of depositors’ and users’ licences enable the AHDS to offer this guarantee to would-be depositors. It should be noted that copyright in data deposited with the AHDS is retained by the copyright holder(s). Terms for access to the deposited data are agreed with the depositor and copyrightholder(s). As a condition of acquisition the AHDS requires a signed licence form for the resource to accompany the data at deposit. Although the AHDS encourages broad terms of access to facilitate scholarship and learning, data resources with severe restrictions on access may be considered for deposit under exceptional circumstances.
Thirdly, strategic partnerships with funding and other agencies in the UK which invest in the creation of scholarly data resources has helped the AHDS to develop a consistent source of high-quality deposits. Presently, the Carnegie Trust for the Universities of Scotland, the Humanities Research Board of the British Academy (now the Arts and Humanities Research Board), the Leverhulme Trust, the Natural Environmental Research Council (which invests in science based archaeology), and the Wellcome Trust for the History of Medicine either recommend or require grantholders to offer data for deposit with an AHDS Service Provider. The Economic and Social Research Council also requires grantholders to deposit data with the Data Archive through which the AHDS Service Providers, notably the HDS, receive depositable materials. In addition, the ADS works with Association of Local Government Archaeological Officers, English Heritage, the Society of Antiquaries, and the Society of Antiquaries Scotland. These agencies recognise that the immense scholarly and financial investment involved in the development of a high-quality data resource can only begin to pay dividends if the resource is maintained over the longer term and made accessible to other users.
Fourthly, the development of a network of arts and humanities data archives and data services has also leant substance to collections acccessible via the AHDS Service Providers. The ADS, through its archaeology gateway, ArchSearch, integrates access to distributed archaeological resources managed or supplied by the Council for British Archaeology, the Royal Commission for the Historical Monuments of England and Scotland, respectively, the Scottish Cultural Resources Access Network, the Society of Antiquaries, and other agencies.Through the Data Archive, the HDS has data exchange and data access agreements which open out onto the substantial collections maintained by European and North American social science and history data archives. The OTA has good working relations with other major text archives, in North America, notably at the Universities of Michigan and Virginia, respectively. The PADS provides access to a pilot digital film collection being developed by the British Film Institute (in collaboration with the British Universities Film and Video Council and the JISC), to catalogue records of the Scottish Music Information Centre. The PADS also plays a key advisory role in the eLib-funded Music Online programme which will provide integrated network access to the online catalogues maintained by the UK's nine music conservatoires. The network’s growth reflects our recognition that the AHDS is not and never shall be the sole source of high-quality data.
Finally, the AHDS has worked to encourage greater and more innovative use of Information Technology and digital resources in research and teaching. Its extensive awareness-raising and training programmes have been developed largely with this single aim in view, and targets, where possible, the professional societies, funding agencies, and research councils which exercise some influence over the professional reward structure (13). Two key obstacles identified through systematic user needs analyses are a professional reward structure which does not encourage computer-based research and teaching and, the absence of agreed criteria appropriate to assessing the scholarly merit of computer-based research and teaching outputs (14). In an effort to overcome these obstacles, the AHDS is currently undertaking systematic research into assessment criteria and hopes to publish recommendations shortly.
Resource Discovery (15)
Here, the challenge confronting the AHDS is a microcosm of that confronting any who manage access to digital information in an extensively networked age: to integrate users’ online access to distributed and heterogeneous information resources. Simply put, users interested in the Crimean War or in divergent trends in European and American Romanticism wish to discover relevant information objects irrespective of whether they are located in libraries, archives, museums, or data services, and of whether they are catalogued according to MARC or any other standard.
The AHDS’s collections are, as we have seen, deeply heterogeneous. They are distributed amongst five Service Providers each of which presents information about its holdings in its own online catalogue (16). The collections are further distributed because several Service Providers have data exchange and interoperability agreements with third-parties. The AHDS' collections are also heterogeneous insofar as they comprise the widest possible variety of data types including electronic texts, databases, digital images, geospatial information systems, and time-based film data. Together with their interdisciplinarity, the collections’ diverse composition of datatypes requires Services to adopt very different resource description and cataloguing practices (17). This is essential. A text corpus needs to be described and documented differently than say an archaeological GIS or an image bank. Archaeological databases need to be documented differently than historical ones, or those which comprise images of and descriptive material about art historical objects. Further, no single cataloguing standard is sufficiently flexible to be applied across the Services (18).
Confronting the challenge, the AHDS undertook work on resource discovery metadata and on appropriate systems architectures. Work on metadata is well documented elsewhere (19). It began from two simple observations. Firstly, work integrating access to distributed information systems was constrained by curatorial domain. Libraries had some success with MARC-conformant catalogues; archives were successful with Encoded Archival Description (EAD) conformant finding aids; and museums were beginning to show success with an evolving Consortium for the Interchange of Museum (CIMI) profile. Secondly, the Dublin Core was beginning to emerge as a potential interchange format of sorts which would enable cross-domain resource discovery. Evaluation of the Dublin Core was, however, limited to a small number of domains. Nor was there much in the way of guidance with regard to its implementation in any single domain. The AHDS's work on resource discovery metadata therefore focused on a formal, cross-domain evaluation of the Dublin Core and was conducted in conjunction with the UK Office for Library and Information Networking (UKOLN). The net result was Discovering Online Resources Across the Humanities: A Practical Application of the Dublin Core and, of course, a road map to guide the AHDS’s development work in this area (20).
Research into information architectures and tools was also assisted by UKOLN and drew upon and informed the development of the MODELS Information Architecture (21). Briefly, the architecture envisages a number of broker services which mediate between the user (situated at a network accessible machine), and a range of underlying information resources or targets (in the AHDS's case, the Service Providers' online catalogues). The brokering services, accessible from a client via the Worldwide web may include:
- resource discovery services which enable users to select and simultaneously query a range of information systems, processing queries using access points which are meaningful across the underlying services, and retrieving results in a unified format;
- registration services which enable users to become known to the broker but also to the underlying information services which may for example, permit queries from registered users only, or require some form of registration for users who wish to obtain access to selected resources once discovered;
- resource ordering services which enable users to obtain access to information objects which they discover by exercising the broker's search and retrieval services;
- authentication services which determine the bona fides of users who request access to a resource discovered via the broker;
- user profiling services which allow users to configure the broker in a manner which suits their own needs (for example, to determine which information resources are included by default in any search, or how results are displayed and/or sorted).
The AHDS adopted Z39.50 as its network application protocol standard of choice and procured the development of a brokering web client and Z39.50 capability for the Service Provider's own catalogues (22).
Results of our work on resource discovery are better reviewed by visiting existing online services than in prose. Readers are advised first to visit the AHDS Service Provider sites where they can use native online catalogues to search specific collections. At the Oxford Text Archive (23) and the Archaeology Data Service (24), for example, readers may note very different capabilities tailored to the information needs of two very different scholarly communities and to the resource description requirements of two very different digital collections. Thereafter, users may wish to visit the AHDS’s Gateway (25) which presents collections catalogues as a virtual uniform catalogue and bases search and retrieval capabilities on an unqualified Dublin Core record. The Gateway also permits users to register with the AHDS, and registered users may browse or otherwise acquire access to those AHDS holdings which are available for educational uses. The Gateway is also minimally configurable by users who may, for example, wish to save queries between sessions, to present themselves by default with a list of the AHDS catalogues suited to their own resource discovery requirements. It is also extensible. Through the advanced search form, it is possible to include additional (non-AHDS) online information resources in any query. This function reflects the AHDS’s view that high-quality scholarly data resources will be extensively distributed and that users will want access to information about them irrespective of where they reside or who manages them.
Challenges to be Confronted
Arguably in three years we have identified as many problems as we have solved. Six particularly pressing ones are briefly outlined here.
Service Providers' success raising awareness about the importance of digital preservation and secondary analysis has produced a large backlog of accessions running to more than 1,000 data resources. The backlog problem is particularly acute at the Archaeology Data Service, the History Data Service, and the Oxford Text Archive but will materialise elsewhere.
Appropriate data archiving strategies are also somewhat elusive. Although the AHDS’s distributed model is appropriate to the development, cataloguing, and redistribution of high-quality data resources and to the development of related services, it is not necessarily appropriate to data preservation. Proper long-term data storage requires substantial infrastructural investment. Strategic partnership with selected data warehouses may offer a more economical approach than repeating necessary infrastructural investment at each of the AHDS Service Providers. Further partnerships, preferably with national organisations, are required to gain the commitment to long-term preservation that is needed if the AHDS, or indeed any other similar organisation is to look after nationally significant collections in the longer term. Incidentally, such parnerships will also noticeably enhance the AHDS's accessioning efforts as these are occasionally inhibited by its inability at present to offer better guarantees of long-term data management.
A similar approach to the development of at least some online data services may also be appropriate. Evidence from user needs studies and from the interim evaluation demonstrates the need for high-quality data resources which are available to browse or analyse online. Although most AHDS Services are capable of developing and maintaining those services, they are not all appropriately resourced to do so. Nor is it entirely clear that all AHDS Services should develop the technical and support infrastructure necessary to maintain their own online data services, especially where opportunities exist to provide such services through additional strategic partnership.
A further problem is that the AHDS does not extend its activities into all arts and humanities disciplines, and particular services, notably the VADS and the PADS, are over-stretched trying to cater to broad and diverse areas. One result is that some scholarly data remain at risk Another, is that the AHDS's several successes have stimulated unsolicited requests for extension into new areas. The situation needs to be reviewed in the coming months.
There are also numerous second-order research issues which arise from our work on resource discovery and which have as yet to be fully investigated:
- Although Service Providers agree about the use of Dublin Core, no such agreement is ever likely to exist about their use of controlled vocabularies. Where Anglo-American Cataloguing Rules are appropriate to and used by the OTA, for example, the Art and Architecture Thesaurus is used by the VADS. Even greater variation is apparent across the service in its use of date and coverage elements. Assisting users as they search across catalogues which are populated according to domain specific controlled vocabularies is a challenge for the future.
- Where Z39.50 is concerned, experience so far reflects the standard's relative immaturity. There are, as yet, few guidelines pertaining to its use. Accordingly, it remains possible (indeed likely) that independently supplied Z39.50-aware applications will conform to the standards yet remain incompatible with one another; at least, interact in ways which are not necessarily meaningful nor helpful to the user. Within a bounded service environment such as the AHDS, interoperability can be achieved through communication and consensus amongst application suppliers. In a far wider and inevitably impersonal networked environment, other means will need to be developed.
- With user registration, authentication, and resource ordering, the AHDS benefits from its circumscribed service environment. As it integrates third-party systems into its Gateway it will confront a significant problem where such services use independent registration, authentication, and resource ordering services. Although some harmonisation may be possible on an inter-organisational level, a more automated approach will be required to support scholarly and heritage users who wish to locate, scrutinise, and acquire access to information objects of interest irrespective of their location, format, and management.
- Finally, we have so far operated with numerous assumptions about users’ resource discovery preferences in a distributed network environment. Those assumptions have shaped the development of the AHDS Gateway and associated systems. How users actually exploit the Gateway, particularly in relation to their use of underlying Service Provider catalogues, will provide useful feedback for the systems’ further development, but also for applied research into resource discovery systems more generally.
Developing and implementing a sustainable funding model for the Service is probably the most significant challenge confronting the AHDS and is taken up at length in the report of the interim evaluation (26). Presently in the UK there are numerous public-sector initiatives bringing significant investment and a high-level of political support to bear on what are ultimately short-term initiatives to encourage better and more effective use of information technologies in universities, schools, libraries, museums, and heritage organisations. Each of these initiatives will succeed to a greater or lesser extent and all will undoubtedly increase expectations about appropriate uses of Information Technology in scholarly, educational, and heritage organisations. The AHDS is in some respects a product of this public sector largesse; it supported entirely with funding which would otherwise be distributed to UK universities and colleges. Whether public funding of any sort can sustain the several related initiatives let alone meet users’ rising expectations is an open question and one which needs seriously to be addressed. In the AHDS, we are hoping to use our location at the nexus of educational, heritage, and library communities to develop several public revenue streams. By definition, success could only be temporary. Greater security relies upon revenues generated by a move, at least in some quarters, to services offered on a more commercial basis, or, perhaps, to a membership community. Here, as elsewhere, we look to developments in the US and elsewhere abroad.
NotesLinks to specific notes are located throughout the text of this article. The complete listing of notes for the article are at http://www.dlib.org/dlib/december98/greenstein/notes.html.
Copyright © 1998 Daniel Greenstein
Correction made to html code, The Editor, December 15, 1998 1:27 PM.
Top | Magazine
Search | Author Index | Title Index | Monthly Issues
Previous Story | Next Story
Comments | E-mail the Editor
D-Lib Magazine Access Terms and Conditions