Volume 5 Issue 2
A Tool for Building Digital Libraries
Senior Digital Conversion Specialist
National Digital Library Program
Library of Congress
A tool for building digital libraries
Markup Languages: Theory & Practice
C.M. Sperberg-McQueen and B. Tommie Usdin, eds., Volume 1, Number 1, 1998
MIT Press, Cambridge, Massachusetts
Markup may not be a household word but digital library patrons benefit from it every time they log on. Two data challenges for building digital libraries are exchangeable data structures for digital library items and adequate descriptive and structural data to allow the exchange. Interoperability and metadata are primary concerns for building national and international digital libraries. Markup languages are proving to be flexible tools for accomplishing the task.
Migration from print formats to digital formats is not straightforward. What digital mechanisms can serve disciplines as diverse as mathematics and history, or literature forms as different as drama and the novel, or documents as small as a receipt or as large as an unabridged dictionary? What defines the boundaries of content in the way that the binding defines the book? How can historical content be converted into useable digital formats without destroying its historical form? Projects like the Making of America (MOA) at the University of Michigan and Cornell University, and American Memory at the Library of Congress are finding solutions to these challenges in Standard Generalized Markup Language (SGML).
How can digital libraries exchange their holdings? What is the meaning of interlibrary loan in a digital environment? How can digital libraries communicate the contents of the library to patrons -- the catalog? the shelf list? Developers of the Encoded Archival Description (EAD) and the Dublin Core Initiative are using markup schemes to create exchangeable descriptive records for library and museum materials.
A common data format and common elements of metadata are essential for libraries to exchange digital items and metadata about those items. Encoding for display is a solution that has proven successful for the World Wide Web. Documents intended for dissemination via the Web are formatted with simple markup, Hypertext Markup Language (HTML). The strength of HTML has also been its weakness. It has been too simple and is inadequate for the diverse array of documents and information types that exist. Content creators struggle to fit content into a data model that is limited by the display of data features such as headings, unordered lists, bold or italics.
A more flexible and expandable markup language is now in development, eXtensible Markup Language (XML). It promises to solve the problems of diverse data types by allowing for user-defined markup rather than browser-defined markup. It allows markup that describes the content rather than the format. This description of content has implications for extracting and reusing the content in ways yet to be imagined.
There are no simple solutions to the transformation of current print formats to digital formats but many are grappling with the challenges via the tool of markup. A new quarterly journal from MIT Press, Markup Languages: Theory & Practice, has just been launched that promises to record for "a more permanent record" both the theory and the practical application of markup languages.
According to its editorial statement, Markup Languages is a "peer reviewed technical journal publishing papers on research, development, and practical applications of text markup for computer processing, management, manipulation, and/or display. The scope of the journal includes: design and refinement of systems for text markup and document processing; specific text markup languages; theory of markup design and use; applications of text markup; and languages for the manipulation of marked up text."
The editors, C.M. Sperberg-McQueen and B. Tommie Usdin, are both well known for their leadership and practice in the field of markup languages. Sperberg-McQueen has served as editor in chief of the Text Encoding Initiative and as a co-editor of the XML language specification. Usdin is president of Mulberry Technologies, a company that provides SGML and XML consulting services to both private and public sector organizations. She has served as chair of numerous conferences for markup languages.
Markup Languages is a print journal supplemented with a web site, http://mitpress.mit.edu/MLANG/. Letters to the editor can be submitted at this site. A threaded discussion forum will facilitate ongoing comments and questions about articles published in the journal.
If the contents of the first issue are an indicator, this is a journal for practioners of markup. Three articles, a project report, a standards report, a review of two books and a squib about character sets offer more practice than theory. The selection of topics affirms that while markup languages provide a common syntax, application of the syntax is creative and unique to each project or tool.
The lead article, "Document structure and markup in the FRESS hypertext system," by Steven J. DeRose and Andries Van Dam recounts the unfulfilled promise of early hypertext editors like FRESS (File Retrieval and Editing System) used at Brown University in the 1960s. Functionality to handle large documents with complex links and alternative views has not migrated to hypertext systems currently in use. Today’s hypertext viewers are limited by narrowly defined linking structures and a document design that constrains the size of documents. The article presents challenges for the expansion of hypertext systems to accommodate complex documents of any size.
The article, "A new generation of tools for SGML," proposes a tool to expand complex Document Type Declarations (DTDs) that include exceptions. The tool builds an understandable view of the DTD that can aid in detecting errors created by incorrect use of those exceptions. Exceptions allow designers of DTDs to write concise element declarations including or excluding elements that may be allowed by other elements within the declaration. Although there are clear illustrations of exceptions, this article is not for novices since the author states that the paper assumes a knowledge of DTDs and exceptions.
"SGML documents: Where does quality go?" describes using a markup language syntax for validating content for database applications. Using parish records and archaeological site reports, the authors describe a methodology for verifying that data entered into the SGML document database conforms to valid rules. An example is defining a rule that birth dates are earlier than death dates.
"News you can Reuse", a case study of The Wall Street Journal Interactive Edition, demonstrates how markup languages, SGML and XML, make possible the reuse of financial and business news articles for a variety of web sites, for display on personal handheld devices and for infotainment video. The article includes many screen shots and examples of markup. It very clearly answers the question, " How did they do that?"
A report on the development of the Document Object Model (DOM) specification from The World Wide Web Consortium focuses on the rationale for its design. DOM is a platform- and language-neutral interface that allows HTML and XML marked up document content, structure and style to be accessed and updated dynamically via programs and scripts. It is anticipated that the interface will be implemented by browser software allowing users to customize tools for using web documents.
In the book review essay, author Chet Ensign makes the case that DTDs matter after all. The two volumes reviewed will help readers understand and use DTDs even in the XML world that touts DTD-less markup. The books reviewed are Developing SGML DTDs from Text to Model to Markup by Eve Maler and Jeanne El Andaloussi, Prentice-Hall 1996; and Structuring XML Documents by David Megginson, Prentice-Hall 1998. Deborah Lapeyre’s annotated tables of contents for both volumes are quite helpful.
With detailed technical descriptions and the assumption that readers already know and use markup, Markup Languages promises to be a source of solid practical experience and thought provoking ideas. The journal will be a welcome arrival in the mailboxes of those who have rolled up their shirtsleeves and are up to their elbows in the nitty-gritty work of making data accessible and usable.
Top | Contents
Search | Author Index | Title Index | Monthly Issues
Letters to the Editor | Next Story
D-Lib Magazine Access Terms and Conditions