Designing Documents to Enhance the Performance of Digital Libraries (Notes and References)

Designing Documents to Enhance the Performance of Digital Libraries

Other testbeds outside of Greco-Roman antiquity include the history of mechanics (in collaboration with the Max Planck Institute for the History of Science in Berlin: http://www.mpiwg-berlin.mpg.de), Shakespeare and Early Modern English (in collaboration with the Modern Language Association: http://www.mla.org), and reconstructing the ancient Egyptian site of Giza (with the Museum of Fine Arts, Boston: http://www.mfa.org). Back to text of story.

Texts already in TEI conformant form include two multi-volume works describing the history and topography of London and Environs (Thomas Allen's four volume History and Antiquities of London, Westminster, Sourthwark and Parts Adjacent, completed in 1829, the six volume Old and New London, by Walter Thornbury and Edward Walford, published between c. 1872 and 1878), progressive exposés of London poverty (John Hollingshead, Ragged London in 1861; Thomas Archer, The Pauper, the Thief and the Convict, 1865), one luridly illustrated description of London as a whole (Gustave Doré and Blanchard Jerrold, London: A Pilgrimage, 1872), and a small sampling of literary works that reference the topography of London (Defoe's Journal of the Plague Year; Dickens' Our Mutual Friend and Bleak House). Another set of documents currently have been sent out for data entry: these include Charles Kingford's 1908 edition of Stow's Survey of London, Robert Wilkinson's Londina Illustrata (1819), and the first four volumes covering London Poverty from Charles Booth's Life and Labour in London (1902-1904). We are still determining the list of subsequent data entry. Back to text of story.

See the http://www.bartholomewmaps.com/lond_5000_inf.htm web site. Back to text of story.

See the http://www.esri.com/software/arcview/extensions/imageext.html web site. Back to text of story.

For budgetary reasons, we needed to rely upon OCR rather than professional data entry for our work so far. This affected our selection of texts (we focused on documents that would scan well) and greatly increased the amount of manual post-processing. Even with the most advanced OCR software available (http://www.primerecognition.com), the process was slow and results uneven. We may have saved on the explicit data entry bill, but we spent at least as much as we saved in labor. Back to text of story.

We plan to evaluate techniques for feature extraction such as those in References [2] and [3], as well as existing software applications. Back to text of story.

See the http://shiva.pub.getty.edu/tgn_browser/about_tgn.html website. Back to text of story.

See, for example, http://www.perseus.tufts.edu/cgi-bin/lexindex?lookup=pe/mpw. The Greek and Latin lexica, formerly separate, now combine to form an integrated unit. It becomes possible to study similar concepts across the two languages in ways that were not possible before. Back to text of story.

The AMICO consortium, which makes art historical materials available, provides an excellent model for such an open licensing system. AMICO (http://www.amico.org) encourages multiple distributors to license access to, and provide distinct front ends for, their data. Back to text of story.

Back to story