D-Lib Magazine
January 2002

Volume 8 Number 1

ISSN 1082-9873

Beyond the scriptorium

The Role of the Library in Text Encoding


Suzana Sukovic
Rare Book and Special Collections Library
University of Sydney, Australia

(This Opinion piece, which is based on the presentation the author gave at the conference "ECAI 2001: Towards an Electronic Cultural Atlas", Sydney, 12-13 June 2001, presents the opinions of the author. It does not necessarily reflect the views of D-Lib Magazine, its publisher, the Corporation for National Research Initiatives, or its sponsor.)



The appearance of electronic text centers within libraries during the last several years has been a significant development for both the library and research communities. At the same time, electronic texts have become a great challenge to the traditional roles in the library, research and publishing communities. Development of electronic textual resources means dealing with documents in new ways and on different levels, often involving work on a document's content through text encoding. This development challenges the library's assumed position in the research process.

Stronger involvement of libraries in text development will enhance functionality of electronic texts and improve information retrieval. Traditional library skills and tools used for cataloguing and indexing can be applied to textual encoding to identify geographical and personal names, dates, events, artifacts, etc. and to provide standardized access to this information. Libraries have always provided this type of service, but some may see the application of the same skills to text encoding as crossing professional boundaries.

1. Changing role of the library

Libraries have always dealt with value-loaded documents, yet have been able to maintain the libraries' value-neutral positions. Libraries provide access to information created by others, but they are not creators themselves -- that is the common understanding, or it was until recently. With development of information technology, various new tasks have been coming to libraries, some requiring new ways of interacting with documents.

A new wave of changes accompany electronic text initiatives, positioning the library across at least three specialized fields: computer support, publishing and academic research. Blurry though they are, boundaries do exist between these fields, and text encoding projects are testing and establishing these boundaries, mostly in practice.

1.1 Libraries and computer support

Electronic texts need considerable technological support, and the distinctions between the librarian's and computer specialist's tasks are often not clear. Librarians and computer support staff, however, have had long experience in dealing with technology together. Electronic texts do not essentially change the existing division between the two professional areas.

1.2 Libraries and publishing

The issues surrounding electronic publishing, on the other hand, are so controversial that there is hardly any aspect of it that assumes common agreement, definition of the meaning of what constitutes electronic publishing being no exception. Some authors distinguish between digitization and electronic publishing (Mercieca 1999) but the National Library of Australia (NLA) states that the NLA "is operating on the basis that anything that is publicly available on the Internet is published" (National Library of Australia 1999, point 3.3). In the new environment, traditional publishing is changing, and although there is much to be done in establishing standards for electronic publications, the library can consider making information "publicly available" to be its legitimate task. What is not completely clear is whether this legitimacy includes provision of whole content in addition to provision of traditional bibliographic information. Presentation of the whole document rather than merely its description, whether called publishing or not, is not currently viewed as a library task.

Since the invention of print, librarians no longer produce documents by copying them. It used to be different, however. A look back in history shows a monk in a monastery library hand-copying a script, but also ornamenting it and accompanying the script with translations. (Resemblance with the old practice is the reason why so many electronic text projects call themselves "electronic scriptoria".) As the monk copies the document by hand, he is not writing a tract about the text, "just" reproducing it. Nevertheless, during the process he might unconsciously insert some elements of his own speech. Perhaps he even decides to make a correction to the document or omits what he considers an unsuitable part. The monk is not -- cannot be -- objective, but objectivity is not at issue. Thanks to the monk's effort, we have copies of valuable manuscripts. Some of these manuscripts containing the monk's alterations are significant cultural contributions in their own right. If we can glean any lessons from history, they might include thinking about enhancements we want to make to documents, interventions we want to prevent, and the value of copying texts.

1.3 Libraries and academic research

The most sensitive questions about text encoding by libraries concern the new role that libraries may fill with regard to academic research. Text encoding deals with content in a very direct way that appears to threaten the library's assumed neutrality in the research process. The traditional role of the library in the research process is based on the assumption that libraries deal with information in a value-neutral way, and part of their neutrality is in the fact that librarians do not treat a document's content. Librarians provide information, but someone else interprets the information and considers its value in meaning systems.

There are also philosophical and practical issues that cause resistance, both in the library and academic community, to changing roles in academic research and to the new digital document technologies. One of these issues involves libraries' limited resources. Nevertheless, thanks to their ability to reposition themselves when required, libraries have already evolved from document repositories to information centers. Might they not also evolve to take a greater role in text encoding of documents?

The role of any research library includes provision of access to information and support for research and learning. Information access, research and learning take place in a cultural context that includes libraries, and culture has never been neutral. Libraries have been assumed to play a neutral role in academic research; however, classification systems and indexing, collection development and acquisitions policies, user services and policy decisions are all colored by cultural values and knowledge. The fact that librarians established procedures, codes and practices did not mean their work became neutral. By the same token, the library profession may need to develop ways of dealing with new value-based roles rather than decide that these roles are incompatible with the library's assumed position as a neutral information provider.

2. Library's contribution to electronic textual resources

Why should libraries be involved in text encoding? The briefest answer is because text encoding affects information access and preservation, which are traditional library tasks. Libraries possess strengths and skills for information organization, access, and dissemination as well as the proven ability to collaborate across disciplines.

2.1 Managing information

The academic community sees the library as the most reliable place where electronic texts can be guarded (Modern Language Association, Guidelines E1 1997). The library has built a good reputation for dealing with information technology, and users want to go to the same place for new developments. The library has already gained considerable experience with computing in the humanities field. McGann describes the close association between computerization in the humanities and in libraries, and he says that this association occurred for one "simple and obvious reason: material demands have driven libraries to study and exploit computerized tools, which allow these research facilities to gain a measure of control over the massive amounts of data they are called upon to manage" (McGann 1996, "The return of the library", 1st par.). Yet, when it comes to electronic textual resources, libraries have not yet contributed their fundamental skills and tools for managing data.

2.2 Serving the research community

One part of the library's contribution to the research community is in providing good collections and access to the information held in those collections. Another contribution is in developing a skill base, firstly within the library and then in the university community. Universities that have well established electronic text centers are already developing significant experience in supporting researchers' encoding projects, either by providing instructions or by participating in these scholarly and teaching projects.

2.3 Serving one instead of many?

There is an underlying assumption that electronic resources are valuable to researchers and that libraries want to find new and better ways to support research. However, it is important to acknowledge that digital collections, electronic texts especially, require significant resources while serving only part of the academic community. Electronic text centers are primarily engaged in developing resources for the humanities. The literature, well supported by anecdotal evidence, suggests a split between the researchers in the humanities along technological lines (Olsen 1992; DeLoughry 1993; Katz 1999; Sukovic 2000). The decision to make a big investment that serves only a part of the research community is being questioned. While we cannot go into the whole argument here, it is important to acknowledge this doubt and address it briefly. Research methods develop over time, with or against technological developments. Research libraries cannot ignore significant advancements in research methods used by the academic area the library is supporting. When determining what their contribution to research should be -- based on how electronic texts are used -- libraries need to take cost/benefit factors into consideration.

3. Approach to encoding

Once a library has decided to become involved in text encoding, the crucial question arises regarding which method of encoding is best. There is little doubt that using SGML or XML according to Text Encoding Initiative (TEI) Guidelines is the library's best choice for developing scholarly electronic texts.

3.1 Existing practice

Over the last decade, electronic text centers have established a number of standard approaches to encoding. Accepted levels of encoding for each center were adopted based on many factors, some general -- like the enormous number of texts to be converted -- and some local -- like the organization's commitment to electronic texts. Various accounts of how electronic text centers were established provide evidence of procedures and practices being established in response to numerous considerations and demands.

3.2 Draft Guidelines

The first Draft Guidelines for TEI encoding in libraries were written in 1999 (Friedland et al. 1999). The Guidelines are significant for several reasons. Firstly, they give librarian-encoders a sense of community. Secondly, standards and codes of practice are necessary to avoid big mistakes and wasted time. Thirdly, the Draft Guidelines provide researchers with a reference point to shape expectations and plan their research projects.

The Draft Guidelines recognize five levels of encoding. Levels 1 - 4 require no expert knowledge of content, but level 5 requires scholarly analysis. Level 1 starts with fully automated conversion and encoding. The complexity of encoding increases at each level up to Level 4, which includes basic content analysis. Level 5 is reserved for scholarly encoding projects and requires subject knowledge -- semantic, linguistic, prosodic and other elements beyond structural level are encoded. The Draft Guidelines are brief and do not give detailed specification for different levels of semantic encoding. They do, however, provide a framework and important orientation points.

3.3 Traditional library skills

Taking stock of traditional library skills and tools, another important skill that librarians bring to text encoding is the library's well-established practice of naming documents and their content. Libraries have traditionally dealt with recognizing and naming various references to people, places, organizations, objects, events, etc. Semantic interpretation has been a regular library practice in assigning subject headings, choosing regularized forms of names, identifying languages used in a publication, and so on. A huge apparatus of codes and rules, thesauri, authority files and labelling systems, has been developed to support tasks of recognizing important information in the document and putting it in an accessible standardized form. The scholarly community depends on the library's interpretation of authorship and the content of whole documents, even corpora. Should it not be acceptable for the library to continue such interpretation through text encoding at the word or phrase level? Information professionals already produce good indexes describing content down to the paragraph level, and some researchers have expressed a desire for an even greater number of detailed abstracting services, bibliographies and catalogues.

This certainly does not mean that librarians should venture into scholarly interpretation. It only means that they should apply their best tools and skills to this new area of endeavor. Another thing to consider is whether the libraries can devote staff and resources to such fine encoding, but that is a matter for grant administrators, and university and library managers. In the long run, it might prove cheaper and more efficient to have librarians marking occurrences of personal and geographic names in texts and have academics continue with proper scientific research than for academics to perform both tasks. There is no reason why librarians should leave behind their experience in using large authority files when they start encoding document content. Additionally, in order for academics to complete groundwork for their research (which is their main interest), it is a waste of time for them to have to learn what librarians already know about indexing and using authority files. Instead, it would be more beneficial to everyone if librarians from electronic text centers got together with their colleagues, cataloguers and indexers, to see how best to bring together the skills from their professional spectrum.

4. Encoding example

The text encoding of Diary of a trip to Australia, 1897, by Evelyn Louise Nicholson (Nicholson 1999), serves as an example for the points raised herein.

The diary is interesting mostly as a historical document. Although it might be possible to consider literary, socio-psychological or any other characteristics of the text, they are, firstly, well in the scholarly domain. Secondly, historical perspective is likely to be a background for other perspectives. Although I could not provide historical analysis, I wanted to make the text searchable as a document on Australian people and places of the time.

Important parts of any historical document are references to dates, people, places, events and objects of historical significance. The need to encode the dates in the diary was obvious. The text is not in a strict chronological order; therefore employing a mechanism to trace dates was useful. I marked up references to "today", days in the week and similar words when they marked the beginning of a new day or a set of events, and when their interpretation in the context was unambiguous.

References to people, places, organizations and boats were also marked. The diary has a number of references to the University of Sydney, and since these references are of particular interest for the collection, I decided they should be distinguished from other types of references. Regularized forms of personal names and the University of Sydney departments and buildings were provided when possible. Personal names were regularized only when identity of a person was unambiguous. In all other cases, references to people were marked with attribute "person", thus making it possible to build a list of all references to people. References to the same person are brought together, even if their name was not regularized.

5. Implications

This type of text encoding is time consuming, especially if applied to all potentially relevant information. However, some texts and certain information in them warrant the attention of researchers, and rich encoding of core texts should not go amiss. Enriched electronic texts also have the potential to showcase the university as an electronic publisher. In an era when private companies work hard to come up with another value-added product, librarians do not have to look far to take advantage of growing opportunities for which their profession has so naturally prepared them.

The important question is not whether libraries should deal with document content -- they always have dealt with it. The questions are: how can librarians extend their skills to textual encoding, and who is going to support this kind of work? Although difficult to answer, these are not philosophical questions but rather are practical questions involving the use of standards and the availability of funds for the new task. Additionally, there is the strategic issue of how to form new alliances of different library specialists and researchers. Finding solutions to these issues could reap manifold benefits: significantly better information retrieval, electronic texts as better research tools, academics freed from doing groundwork themselves, raised status of research libraries, and a competitive product from the university.

With centuries of experience in reproducing, cataloguing, classifying and indexing documents, as well as in information design and retrieval, librarians are well positioned to take a role in text encoding -- to move beyond the scriptorium and beyond traditional library roles.


Copyright 2002 Suzana Sukovic

D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/january2002-sukovic