This article reports on a current project of the Bundesarchiv (National Archives of Germany), funded by the Deutsche Forschungsgemeinschaft (German Research Foundation). The project brings together in a collaborative portal the finding aids from archival records on SED (Socialist Unity Party of Germany) and FDGB (Free German Trade Union Federation) from five eastern German state archives. To accomplish this the locally available, heterogeneous data formats must be transformed into a common profile of the international standard format for inventories: Encoded Archival Description (EAD). In addition to EAD, the Encoded Archival Context (EAC) is applied for the presentation of the provenance of the archival materials and the Encoded Archival Guide (EAG) is used for the information on the archives themselves. However, in spite of the use of EAD, EAC and EAG, there are still many aspects that need to be considered regarding interoperability with content from other cultural heritage organizations such as libraries and museums. When archival inventories are combined with catalogue data from libraries, for example, the archives' content stays inhomogeneous to the libraries' content. One reason for this, among others, is the diversity of the treated objects themselves. While publications from libraries are treated as single objects, finding aids from archives resemble collections. They subsume descriptions of single objects previously defined by archivists. A particular aspect of this heterogeneity is that the single objects connected in inventories cannot exist without their context. In addition, the different requirements and traditions of libraries and archives create a continuing heterogeneity. In spite of this heterogeneity, in the future it would be beneficial if users could employ a common searching entrance that takes into consideration the differences of the various knowledge institutions.
1. The Bundesarchiv projectIn September 2007 the Bundesarchiv (National Archives of Germany) began a project funded by the Deutsche Forschungsgemeinschaft (German Research Foundation). The project task was the "development of the portal 'network SED-/FDGB-fonds' to a reference application for an archival information portal of Germany" (SED stands for the Socialist Unity Party of Germany, FDGB stands for the Free German Trade Union Federation.) The two-year project is being carried out in co-operation with the state archives that are located in Berlin, Brandenburg, Mecklenburg-Western Pomeranian, Saxony, Saxony-Anhalt and Thuringia. The project aims to merge the current heterogeneous information of the archives into a common profile using an international standard format on which the presentation in the information portal, as well as a notional exchange with other information providers, is based. This article presents the project and discusses what factors influence interoperability of archival materials with other cultural and knowledge institutions. (More detailed reports on the project may be found on the Bundesarchiv project website at <http://www.archivgut-online.de/>.) After the project period has expired, the Deutsche Forschungsgemeinschaft will determine whether this path to a portal extension to the archival information portal of Germany will continue to be promoted or another project with this focus will be funded.
1.1 The work of archives
For a better understanding of the project, first we will briefly describe the core business of archives in general. Roughly speaking, archival work consists of sighting incoming archival records, and eliminating duplicate records and elements deemed unimportant. With the records and elements that remain after this process, a Findbuch (inventory) is developed in which every separate file unit of each archival record is registered in a more or less detailed way. Such a fonds is normally arranged according to the principle of provenance, i.e., the archival resources are classified due to the arrangement they were given in their office of origin. The inventories record (in the respective fonds arrangement) the classification titles, and under the titles, the units of description, purposes of genesis, materials and time frame. In addition to the title, specifications or notes about events, occurrences, materials or persons can also be registered as to "scope and content". Thus, an inventory offers a description of the archives' records; therefore, it is the fundamental point of entry for users who wish to search the archives. At this point in time, printed inventories are offered on the spot in the archives' reading rooms, and many archives to a great extent also publish their inventories online. In some federal states of Germany, union catalogues have already been developed for database searches (for example, [ARIADNE], [archives in NRW], etc.). The Bundesarchiv project is bringing together in a central archival union catalogue the inventories for the archival records from the SED and the FDGB, which to date have been held in different archives, in a wide spectrum of subjects.
1.2 The union catalogue
The union catalogue uses Lucene, which is open source technology from the Apache Software Foundation project. Lucene is renowned for its high performance and scalability, and it is already a component of numerous software applications [Lucene]. The Lucene technology is able to index content in numerous formats. For the union catalogue being created by our project, eXtensible Markup Language (XML) documents are processed.
For the archival records of the SED and FDGB network, the standardised description format Encoded Archival Description (EAD) is used, which is coded in the XML syntax. We use the latest version, EAD 2002, which is an international standard developed in the USA and now established in Europe as well [EAD]. EAD 2002 aims to be more compatible with the rules of the General International Standard Archival Description (ISAD(G)), which were adopted in 2000 by the International Council on Archives (ICA) [ISAD(G), 2000]. However, EAD 2002 is substantially more comprehensive in describing elements than ISAD(G).
The Bundesarchiv project applies Encoded Archival Context (EAC) [Cover, 2007] for the description of provenance. (Provenance can refer to persons, corporations and families.) EAC is not yet an agreed standard. It is present as a 2004 working draft and conforms to the ICA standard for standard files in archives, the International Standard Archival Authority Record for Corporate Bodies, Persons, and Families [ISAAR (CPF), 2004]. The EAC Working Group was supported by the LEAF (Linking and Exploring Authority Files) project being co-ordinated by the Berlin State Library in which, among other things, numerous European libraries and archives cooperate [LEAF, 2004]. Currently, the EAC Working Group of the Society of American Archivists [SAA, 2003] is in charge of the further development of EAC.
What is still lacking in the previous enumeration of archival standards is the information about archives illustrated in the union catalogue with the Encoded Archival Guide (EAG). The EAG format exists as a suggestion in the 2002 alpha version 0.2, which was developed by a working group of the General for State Archives in Spain [EAG, 2002]. It has been implemented successfully in the Spanish language in a world-wide union of approximately 40,000 archives in the archival portal Censo Guía de Archivos Españoles e Iberoamericanos [Censo Guía, 2008]. In its final draft, EAG will be oriented on ISDIAH (International Standard for Describing Institutions with Archival Holdings), the first edition for the representation of archival information submitted by the ICA in May 2008. EAG was discussed during the 16th ICA Congress in July 2008 in Kuala Lumpur [ICA, 2008].
In addition to EAD, EAC and EAG, the archival union catalogue is using the Metadata Encoding and Transmission Standard (METS) to embed digitised records into the inventories [METS, 2008]. The METS document forms a quasi container in which pictures are joined and structured with administrative and describing metadata, so the coherence of all pictures, regarding contents, can be represented. Afterwards, the respective METS files are linked within the EAD inventory.
At the end of the current Bundesarchiv project, the archival union catalogue will present the consolidated, electronic inventories, as well as information concerning the archives that are providing the inventories, the provenance of the archival records and, in the ideal case, the archival records themselves in digitised form. What is important for the union catalogue search and presentation is that aside from the display of search hits after a full text search, the coherence of the results in the finding aid always remains recognisable, so that on one hand, a user can jump from hit to hit, but on the other hand, the hit can also be navigated within context. Thus, the Bundesarchiv project differs from the way other archival union catalogues present their search results, which indicate only the respective hits and not their context [Archives Hub].
1.3 Tools used for the collaborative project
EAD finding aids can be constructed easily with the aid of the software program MEX (MidosaEditor for XML Standards), including the creation of EAC data and the integration of digitised records via METS. This new tool is the result of the <daofind+> project, which the Bundesarchiv initiated in 2007 with support of the Andrew W. Mellon Foundation, New York, [Daofind, 2007]. The MEX software, released with an Open Source licence, is available in English and German for the Windows XP and Vista operating systems, as well as for Mac OS X, on the platform sourceforge.net [Sourceforge.net, 2008]. An EAD finding aid handled in MEX can be exported directly as an HTML presentation with full text search within the record group. Examples of single online inventories with digitised file units include the fonds: NVR, National Defence Council of GDR, [BArch, 1/39458-39539 DVW, in 2007] and NS 8, National Socialism, Office Rosenberg [BArch, NS8, in 2007]. Three other examples can be found on the Daofind project website [Daofind, 2007].
To prepare for the integration of all the inventories of the SED and FDGB fonds, extensive one-on-one conversations with responsible persons from all partner archives were conducted. This was done in order to determine each archive's respective data structure and approach and to transfer the local data format via a mutually compiled concordance list to the EAD format afterwards. Eventually, all concordance lists will go into a data converter that will be programmed especially for the Bundesachiv project. This data converter will enclose not only the format mappings but will also offer the edition functions within the EAD structure as they are already known from MEX. This is so that the inventories, if necessary, can be supplemented with several data fields and can be enriched with digitised archival records via METS as well. However, the edition function still needs to be extended by the EAG format.
In the end online inventories will be available in the German archival union catalogue, which with EAD, EAC and EAG will allow standardised and international record formats to be shown, and thus, for the first time, an international data exchange can be considered.
2. Aspects of interoperability between archival data and data from libraries
At this point the question is about the interoperability of these integrated inventories from Germany with other national and international applications. Can the information illustrated in EAD be transferred expediently to other information portals? The answer to this question is difficult as it depends on the purpose of the application. Because libraries have a great deal of experience in the construction of comprehensive information portals, this article will discuss aspects of interoperability of content from archives and libraries. For successful co-operation between libraries and archives, several aspects need to be taken into account. These aspects are discussed below in Sections 2.1 through 2.8.
2.1 Use of a standard format in the archives management
The use of a structured standard format in XML syntax is only the first fundamental aspect to consider for successful co-operation between different content providers. A homogeneous and machine-readable format that is supported by the group of information providers is a basic requirement for creating a co-operative information portal. The Bundesarchiv fulfils this requirement by transferring the diverse data structures from the various archives into the EAD format.
2.2 Single object vs. collection
A second aspect involves the need to pay attention to the different object types that the information portals of different cultural and knowledge institutions will provide. Objects like publications and descriptions of original sources have different traits, which are expressed in the different description formats. An inventory is not a single object like a publication, but is a collection of instructions to units of description. The respective single descriptions are important with regard to content, and the respective description units have grown successively during numerous workflows in a place of origin. Finally, at the end of a term in the archives they have been divided up for archiving in an underlying order as the smallest entity of the fonds. The users search in these units of description for thematically relevant material.
On the other hand, all separate units of description of fonds stand in a communicative connection. If they were considered in an information portal separately, they would be torn from their context. An inventory is finally about a catalogue that refers to single objects closely connected to each other. Because of this, an inventory is considered to be neither a single object nor a collection in which single objects can stand for themselves. For an integrated information portal in which single objects as well as fonds that look like collections are presented together, a suitable way must still be found for archival inventories that are a mixture of both.
2.3 Completed vs. sequential publication
Furthermore, a trait of online inventories is that they describe archival fonds that can be exposed to more or less continual changes, such as adding partial fonds later. It is important to note that the issue date of inventories always expresses only the moment of the latest change. This is different than content one finds in libraries where, for example, a revised article is treated fundamentally as a new, independent publication and the previous version survives. Archives on the other hand overwrite and replace the previous inventory entirely and apply a new date. In the process, changes are normally not documented. Before bringing together library catalogues and online inventories, this difference between completed and sequential publication needs to be kept in mind.
2.4 Use of persistent identifiers
Another aspect to consider is that many libraries use persistent identifiers for content and archives do not use them. Reference numbers will be given to publications in libraries, which apply only to the respective library, but additional unambiguous identifiers are used as well. This allows the identification of every object as unique worldwide. For electronic publications in Germany, for example, the Deutsche Nationalbibliothek (German National Library) co-ordinates the assignment of Unified Resource Names (URN), so that any title can be found at any time and is quotable. In addition, changes of the access point for each title are documented continuously. If changes are made to a published text, a new URN will be assigned to the altered text that provides a reference to the previous one [DNB, 2008]. However, archives do not provide a Persistent Identifier (PI) in addition to the locally assigned reference numbers for the fonds. A system of URN assignment cannot be established yet for archives, because the Deutsche Nationalbibliothek is focused only on single objects. It is not designed for relatively dynamic and changeable collective objects. Because changes in inventories appear more frequently than in publications, the administrative effort connected with actualizing a URN system for archival content would not be accepted by the archives.
2.5 Flat structure vs. hierarchical structure
The fact that finding aids are about grown fonds and not single objects results in a difference in the complexity of representation. The EAD format offers up to twelve classification points nested each into another to illustrate the hierarchical structure of fonds. In contrast publications are covered on a single description level. The data model of a publication and that of a finding aid have only a few data fields that are congruent to each other. If the structured, extensive and deep descriptive information of the archival inventories is transferred to the 15 Core elements of Dublin Core Metadata Initiative [DCMI, 1998], only the highest level of the respective inventories can be illustrated as a whole publication. The bulk of inventory information, namely the classification scheme, which is specified in rich detail, will be suppressed when using Dublin Core.
Even if one considers the highest level of the finding aid in Dublin Core, it would still be presented inadequately. Two of the difficulties, concerning the date and the identifier, were already mentioned above. The technical data, like format, type and language, are relatively simple to generate. However, a textual description limits itself to title, publisher and time frame of the archival records. But German archives normally do not cover the subject, a description, an author or supporter in bibliographic information.
2.6 Differences between the traditions of libraries and archives
There are obvious differences between the way libraries and archives have traditionally worked, and there are even differences in the traditions of various archives. For both libraries and archives there are also national as well as international differences. The result of this is that, certainly, a standardised, international format can be applied for the description of archival records as a frame that offers appropriate elements for much possible information. But it is uncertain to what extent this information is needed or desired at the regional level.
Libraries have been working longer than archives to standardize indexing information and have developed standards for description formats, like [MARC21 2006, MAB2 2006], etc. With these standards in use data can be exchanged among libraries or can be used to create a union catalogue for use across regions. In contrast because of the uniqueness of the material in archives, standardisation was not mandatory until now. Published description directives [OVG, 1964; Papritz, 1983] and the common central education made it appear that archival management from one archive to another was quite comparable; however, altogether archival development differs from one archive to another according to the material to be held and its structural forms, according to the local rules of each archive, as well as to the persons handling the archival materials. Software manufacturers to this day use proprietary formats for their archival systems. Only recently has the EAD standard become established in Germany. In addition to the EAD format, the SAFT- XML format developed in Germany is still used as an exchange format for the retro-conversion of old printed finding aids [Schieber, 2007]. Thus, in archives management, experience with standard formats and standardisation is still recent and has not yet prospered as it has in libraries. An important factor motivating archives to use standards now is the construction of information portals and union catalogues that gather together the information from many separate archives.
The implication of this is that not all expedient data fields are supplied by every inventory. Sometimes this involves very important data, which are essential for combining different search tools. For example, for a long time libraries have used standards for classification [LCC, 2008; DDC, 2008; RVK, 2007], keywords [LCSH; SWD, 2008], and the names of corporate bodies and persons [LCNAF; GKD, 2007; PND, 2008]. In contrast in inventories no classification is given that would qualify the fonds thematically. Classification reveals the respective arrangement structure, quasi as the table of contents of the fonds whose naming arises from the titles of the separate arrangement points. In no way is classification of a finding aid comparable to the classification used by libraries. Archival classification shows rather an image of the immanent structure of the material.
Libraries and archives handle keyword indexing differently too. Some, but not all, archives extract index words from the whole fonds and from separate units of description. This happens sporadically, and does not occur for every unit of description or for every fonds. Besides, the indexing always refers to the inventory, not to the archived documents described in the inventory. Those archives that do provide indications most often use the archives' own notation for persons, places and things without employing standards. Accepting values from the PND, GKD and SWD usually does not occur either. So far, German archive management argues against the adoption of standard data. This is because the development of standard data is considered to be an additional expenditure, and in view of the rising costs archives face, as well as the existence of huge, unrecorded and therefore inaccessible fonds, standardisation is not cost-effective from the archives' viewpoint. In this German archives management differs from American archives management, where Authority files [LCNAF] and Library of Congress Subject Headings [LCSH, LoC]. are assigned to the separate units of description.
2.7 Inventory and source material language information
Recently, as archival material has been made available internationally as well as regionally, it has become vital that the language of the archival resources is known. Before this, German archival materials were expected to be in German, even though sometimes the resources held might be in another language, such as French, English, Swedish, or Russian, etc. In order to use archival resources, knowing in which language the finding aid is written is highly relevant. Knowing the language of the source material is important as well. Yet language is not registered everywhere on a continuous basis. Only in a European information portal for archives does it become apparent how important the criterion of language is for users conducting research. Because of this, the Bundesarchiv project adds the language of the inventory and the material automatically, in addition to other data, to put the online inventories into an international context.
2.8. Legal aspects affecting archives
The final aspect we will consider in this article is that of legal restrictions affecting archives. Archives have to deal with certain legal conditions with regard to their archival records that libraries do not. For example, archives must adhere to a legally prescribed 30-year retention period after production of the act, during which the archival records may not be customised for a reference service. Preliminarily, no inventory of the material is published. In addition, personal information about persons related to the archival material in some way is released only 30 years after the death of the relevant persons. The metadata relating to rights of use are maintained partly in the archival systems. Record groups or units of description, which are concerned with the restricted or closed material, will carry a lock flag. Therefore, some of the archives involved in the creation of the union catalogue will ask that during the conversion to the EAD format that notes and index notions of closed units of description be removed and that only the title of the resource be mentioned. The removed information is replaced with announcements that indicate when the exclusion zones can be abolished. In this way it isn't necessary to close the entire inventory. Users can find the unit of description and can at least file an application for abolition of the lock flag.
Before an archival inventory can be presented in a national or international context, a selection of the legally restricted or closed record groups must be made. Every information portal, whether regional, national or international, that holds information about these materials must guarantee that the data sovereignty regarding the inventories remains at all times with the respective archives. Consequently, an online-inventory can be removed at any time, and at short notice, should changes regarding restrictions to the materials be made.
Even if a structured, international standard format is used like EAD for the production of online archival inventories, the archival contents remain heterogeneous to the contents of libraries. Reasons for this are the differences between library and archive materials, and the different needs and traditions of libraries and archives. If, in an information portal, a single object with a flat description structure meets a fonds with hierarchically arranged description data, this has to be accounted for conceptually for effective user search and navigation, and to cope equally with the differing qualities of publications and inventories.
As discussed in this article, materials to be integrated into a European archival information portal should be prepared using EAD, EAC and EAG. But even if those standards are used, agreements regarding the several aspects outlined above will be needed. However, important congruent data structures for the integration of all the catalogue data from libraries, archives, and museums into a European digital library [EDLproject] are still lacking.
A common search interface is needed that could be used in addition to full text search, one that accesses both archival inventories as well as single objects like publications, and one that allows navigation by faceted browsing or drill down adapted from keywords to limit search results. However, this is difficult to implement effectively without the existence of standardised keywords [Imhof, 2006]. Libraries have long had standards for this purpose; archives have not. However, a solution to this problem can arise only from the archives themselves and cannot be submitted externally.
Application of international standards cannot guarantee interoperability between libraries and archives. It is more appropriate to accept the heterogeneity of these institutions and to plan and execute a common presentation accordingly, taking into consideration the aspects discussed in this article. After conducting a full text search, users will still need an orientation in the respective context of resources. Only then will the results of the search be understood fully.
[BArch, DVW 1/39458-39539]. (2007). Bestand DVW 1/39458-39539. Nationaler Verteidigungsrat der DDR: Sitzungsprotokolle 1960-1989. Retrieved August 11, 2008, from <http://www.bundesarchiv.de/fb_daofind/Zdaofind_DVW1_NVR/>.
[BArch, NS8]. (2007). Bestand NS8. Kanzlei Rosenberg: 1918-1945. Retrieved August 11, 2008, from <http://www.bundesarchiv.de/fb_daofind/Zdaofind_NS8/>.
[EAG]. (2002). Encoded Archival Guide Document Type Definition. v. alpha 0.2. Retrieved August 11, 2008, from <http://aer.mcu.es/sgae/jsp/censo_guia/Documentos/EAG.DTD.txt>.
[GKD]. (2007). Gemeinsame Körperschaftsdatei. Official Site. Retrieved August 11, 2008, from <http://staatsbibliothek-berlin.de/deutsch/abteilungen/ueberregionale_bibliographische_dienste/gkd/>.
[Imhof, Andres]. (2006). RSWK/SWD und Faceted Browsing: neue Möglichkeiten einer inhaltlich-intuitiven Navigation. Bibliotheksdienst 40. Jg. (2006) 8/9, p. 1015 - 1025. Retrieved August 11, 2008, from <http://www.zlb.de/aktivitaeten/bd_neu/heftinhalte2006/Erschliessung01080906.pdf>.
[LCNAF]. Library of Congress Name Authority Files. Official Site. Retrieved August 11, 2008, from <http://authorities.loc.gov/cgi-bin/Pwebrecon.cgi?DB=local&PAGE=First>.
[MAB2]. (2006). Maschinelles Austauschformat für Bibliotheken. Official Site. Retrieved August 11, 2008, from <http://www.d-nb.de/standardisierung/formate/mab.htm>.
[OVG]. (1964). Ordnungs- und Verzeichnungsgrundsätze für die staatlichen Archive der Deutschen Demokratischen Republik. Herausgegeben von der Staatlichen Archivverwaltung im Ministerium des Innern der Deutschen Demokratischen Republik 1964. Digitised 2005. Retrieved August 11, 2008, <http://www.staatsarchive.de/publikationen/OVG.pdf>.
[Papritz, Johannes]. (1983). Archivwissenschaft. Vol. 1 - 4. 2. Revised run, Marburg 1983.
[PND]. (2008). Personennormdatei. Official Site. Retrieved August 11, 2008, from <http://www.d-nb.de/standardisierung/normdateien/pnd.htm>.
[RVK]. (2007). Regensburger Verbundklassifikation. Official Site. Retrieved August 11, 2008, from <http://www.bibliothek.uni-regensburg.de/Systematik/systemat.html>.
[Sourceforge.net]. (2008). Sourceforge.net. Download from <http://sourceforge.net/project/showfiles.php?group_id=212526>, retrieved August 11, 2008.
[SWD]. (2008). Schlagwortnormdatei. Official Site. Retrieved August 11, 2008, from <http://www.d-nb.de/standardisierung/normdateien/swd.htm>.
Copyright © 2008 Andres Imhof