Volume 20, Number 9/10
Table of Contents
Selecting Newspaper Titles for Digitization at the Digital Library of Georgia
Digital Library of Georgia
Newspapers have been a significant target for digitization over the last decade, and libraries, archives, and other cultural institutions must decide how best to utilize their limited funds to digitize a select number of newspaper titles for public consumption. This case study examines the Digital Library of Georgia's newspaper digitization selection process and how it incorporates national standards with its own project-specific criteria. The article includes a discussion of the roles played by user demand, content significance, funding, copyright, optical character recognition, and microfilm holdings in the decision making process, with the ultimate goal of creating highly used, well-regarded, and cost effective online newspaper archives.
Newspapers have been a significant target for digitization over the last decade. The wealth of information these materials provide serves multiple audiences and disciplines, making them a particularly valuable resource to make more widely available. Libraries, archives, and other cultural institutions must decide how best to utilize their limited funds to digitize a select number of newspaper titles. Grant-driven digitization efforts, like the Library of Congress' National Digital Newspaper Program (NDNP), provide clear and useful selection criteria for their participants. Organizations working to digitize newspapers outside of those programs share many of the same considerations, but they also deal with additional concerns unique to their situation. One such institution is the Digital Library of Georgia (DLG), whose newspaper digitization selection process will be examined as a case study in this paper.
In 2007, the DLG initiated a project to digitize the Red and Black, the student newspaper of the University of Georgia, from the microfilm holdings of the Georgia Newspaper Project. The venture served as a pilot project for a larger initiative to digitize the state's historical newspapers. Once a process was established, a set of criteria was needed to determine future newspaper digitization projects after it completed the Red and Black Archive. Accordingly, the DLG, GALILEO, and Georgia HomePLACE collaborated to create a selection strategy that addressed that need, resulting in the digitization of over a half million pages during the first five years of the project and unprecedented usage numbers. That strategy incorporates demand, historical significance, funding, and availability, along with restrictions including copyright and technical concerns. These factors, in the context of the project, are discussed below.
The main resource for those researching selection criteria for newspaper digitization is the guidelines set forth by the National Digital Newspaper Program. Their publications are both practical and well-organized, with particular attention paid to the technical aspects of microfilm selection. The technical guidelines are updated annually and cover microfilm selection, scanning, OCR, and the creation of metadata. Although not all of the criteria covered in their publications are applicable to those working outside of the grant, the NDNP guidelines are a valuable starting point for establishing a method for newspaper title selection.
Molly Kruckenberg of the Montana Historical Society published guidelines for its Montana Newspaper Digitization Project Selection Advisory Board. She sets forth criteria specifically addressing the history and geography of the state, in addition to coverage, availability, copyright, and other more general factors. The recommended process results in the ranking of papers by priority, with the highest ranked titles examined for technical feasibility before being included in the final list of titles to be digitized.1
Ross Harvey has discussed the newspaper selection approach as an effort to find balance between preservation needs and user demands. He concludes that the physical safeguarding of the materials through digital preservation should take precedence over demands for popular newspaper titles, but he asserts that compromise can and should be established in the newspaper digitization selection process.2
While drafting standards for deciding which newspaper titles to digitize, the goal of the DLG was to create a selection method that would result in highly used, well-regarded, cost effective, and legally sound online newspaper archives. While no specific factor was necessarily given priority over another, some criteria were non-negotiable including copyright and title availability, due to their prohibitive nature. The overall intention was to find a balance among the considerations listed below in order to pinpoint the most suitable newspaper titles for digitization. As the DLG continued its digitization work, some of the criteria were given more emphasis to create balance over time.
One consideration not discussed in this paper is digitization as a method of preservation. Many organizations identifying archival materials for digitization incorporate physical concerns, including the condition and need for preservation of the documents, into their selection criteria.3 The newspaper digitization efforts at the DLG, however, utilize microfilm copies of the publications. The Georgia Newspaper Project (the source of newspaper microfilm copies used by the DLG for digitization) conducts their microfilming with preservation as the primary concern. This frees the DLG from integrating physical considerations into its selection criteria, putting the focus instead on concerns related primarily to access.
In order to optimize access, project organizers deemed the needs of the user a major consideration when selecting newspaper titles for digitization. The NDNP doesn't include user demand in their content selection guidelines, focusing instead on research value to drive usage. While the DLG also included content significance in its decision making process, as discussed below, the organization found added value in ensuring that newspapers were digitized to meet the demands of their users. Luckily, user interest often coincides with the research value of the newspaper titles. This interest in regard to newspapers frequently differs from that of other types of research materials, as the emphasis is more often on geographic rather than subject-based considerations. For this reason, it was necessary to examine user demand specifically in relation to newspapers. Fairly early in the process, Georgia HomePLACE, in conjunction with GALILEO, conducted an informal survey of librarians from around the state and requested information on which newspapers their users most often requested access to. Two general trends emerged. First and foremost, users are interested in the newspapers from where they live, regardless of their size or historical significance. Secondly, they want access to newspaper publications from the largest cities in Georgia. The findings confirmed conclusions drawn from previous interactions with librarians and users.4
These user predilections led the DLG to place early emphasis on the most populous cities in the state, including Atlanta, Athens, Columbus, and Macon. The digitization of titles from those cities would, according to the survey, draw heavy usage from both residents of those population centers and researchers from other parts of the state who are frequently interested in the history of those larger cities. This approach would seem to preclude the digitization of smaller city newspapers due to the potential for limited use, but additional considerations and approaches were taken into account to compensate for the inclusion of other cities and titles.
Since most newspapers of the nineteenth and early twentieth century carried similar content, including national and local news, agricultural columns, serial literature, and ads, the historical and geographical importance of a city and its newspaper titles became one of the top content considerations for planning future projects. This approach required planners at the DLG to examine the history of the state and how that might affect what researchers are interested in using.
Much of Georgia's early colonial growth occurred in various locations along the state's fall line, the farthest navigable point up rivers. Some of Georgia's oldest and largest cities developed according to this geographical pattern, including Macon, Columbus, and Milledgeville. Because of their long histories as commercial centers within the state and their newspapers' coverage of some of the most significant events in the state's development, titles from those cities were among the first chosen. Moreover, organizers examined the political history of the state and its effect on nineteenth century journalism. The state of Georgia has had several capitals since its establishment and those cities, including Atlanta, Milledgeville, and Savannah, were also given high priority due to their historical importance.5
Milledgeville is an example of a city that was selected for digitization due largely to its historical significance. The state government established the city on the Oconee River along the fall line and it served as the capital of Georgia from 1804 until 1868. Milledgeville was a population center for much of the nineteenth century and hosted the state's government during a significant time in the state's history, which included the rise of plantations and slavery in the antebellum period, the Civil War, and a portion of Reconstruction. In addition, the city hosted Georgia's secession convention, served as a temporary headquarters for General William T. Sherman during his March to the Sea, and housed the largest mental hospital in the state. While its modern day population is modest, its importance as the state capital before and during the Civil War outweighed concerns about local usage. This assumption by project planners proved to be correct, as use of the site surpassed that of all previously released sites.6
Figure 1: Milledgeville Historic Newspapers Archive
Since 2007, the DLG's newspaper digitization project has been funded by Georgia HomePLACE with LSTA funds administered by the Institute of Museum and Library Services through the Georgia Public Library Service. To supplement the project's financial support, organizers decided to give consideration to projects that include a measure of local or additional resource support in accordance with the DLG's collection development policy. The decision was not meant to usurp practical and historical considerations; rather, it was intended to help materially support projects that were already considered significant and provide opportunities for digitization that would not normally be available. This approach is another example of a difference between the selection criteria of the DLG and the National Digital Newspaper Program. The NDNP need not include funding in their list of considerations due to the fixed nature of the grants provided to its participants.
Funding considerations have helped guide the DLG's decision making process in interesting and beneficial directions over the project's first five years. A project to digitize newspapers published in Athens received supplemental support from the local community, which allowed the DLG to add an additional twenty thousand newspaper pages and three additional titles to the online archive. Supplemental funding has also had the added benefit of allowing the project to digitize papers that cover underrepresented populations and time periods. The Southern Israelite archive, for example, was privately funded by the Bremen Museum, which also aided the DLG in obtaining permission to digitize the mid-twentieth century publication produced for the Jewish community in Atlanta. The permissions, along with the private funding granted on that project, presented the DLG with an opportunity to digitize a valuable newspaper covering an underrepresented minority within the state which had not previously been within the purview of DLG's digitization efforts.
In addition to community interest and historical considerations, more concrete concerns were also taken into account. Since a majority of the projects' newspaper image scans would be derived from the Georgia Newspaper Project's microfilm collection, the availability and quality of their holdings had to be considered when deciding which projects to undertake. This limiting factor was discussed in conjunction with previously mentioned factors to ensure that not only were the digitized papers important and in demand, but also that there were enough issues in the collection to warrant undertaking such a project.
While the microfilm collections of the Georgia Newspaper Project are extensive, they sometimes contain omissions and gaps in title availability because either the paper copies of specific titles were inaccessible or were preemptively filmed by a commercial organization. Without a significant number of microfilmed issues of a historical newspaper title available for digitization, the viability of creating an online archive for that title is reduced, regardless of its historical significance. The city of Louisville, for example, served as Georgia's capital in the late eighteenth and early nineteenth centuries and was one of cities in the path of General William T. Sherman's March to the Sea during the Civil War.7 Despite the city's historical significance, the Georgia Newspaper Project holds less than three full reels of microfilm for the city, which hardly warrants its own archive. For this reason, early consideration was given to other cities, including Macon, Columbus, and Milledgeville, because of the completeness of their microfilm availability.
Compensations were made for this limiting factor in later newspaper archive instances. Project organizers made plans to create both city and regional newspaper archives after priority was given to titles with greater microfilm availability. The archive sites containing an entire city's worth of newspaper titles allowed for the cobbling of several small but significant title runs from a large city to create a significant sized database, as was the case with the Atlanta Historic Newspapers Archive. The DLG was also able to digitize the publications of smaller cities by creating regional newspaper archives. This configuration allowed for the digitization of newspapers from Albany, Valdosta, Bainbridge, and several other cities which were combined into a South Georgia newspaper archive.
The DLG scans newspapers from negative silver halide master copies of microfilm and the condition of that microfilm also carries weight in the decision making process, including the optical character recognition (OCR) accuracy that can be achieved from scanned images of the microfilm. This factor holds great significance because of user demand for keyword searchability in online archives. If OCR page readings are highly inaccurate, those pages might as well be invisible to the casual user. Taking this issue into consideration, the DLG decided that a publication's OCR accuracy must be tested before it is definitively selected for digitization. The project managers determined that readings should exceed ninety percent accuracy to be deemed appropriate for selection. If the OCR tests of a title produce results consistently below that threshold, that title is rejected in favor of a title with greater accuracy for full text searching.
The National Digital Newspaper Program has published significantly detailed guidelines in reference to microfilm and the technical specifications required to select a newspaper title for digitization. They suggest that reduction rates, density variations, and resolution should all be examined in conjunction with the testing of OCR before selecting a title for digitization. Unfortunately, this can result in the delay and possible rejection of the digitization of valuable materials. The DLG, during the planning stages of their South Georgia newspapers archive, considered digitizing several titles from the city of Brunswick, ultimately selecting one title in favor of another due to superior OCR results produced during testing.
Copyright law also must be taken into consideration when deciding which newspaper titles to digitize. According to United States copyright law, all works published and copyrighted before 1923 are now in the public domain.8 While copyright restrictions for many post-1923 newspaper publications have also passed into the public domain due to the publisher's failure to renew the copyright, confirmation of this fact would require the staff to conduct extensive research and the online publication of these materials could lead to take down notices. For this reason, project planners decided to initially concentrate almost exclusively on titles published before 1923 and priority has been given to titles with a larger nineteenth century presence. As the available nineteenth century titles are digitized and put online, the DLG can focus on researching the rights of more twentieth century titles.
This decision was complemented by the organization's desire to digitize newspapers of historical importance to the state and avoid the densely published titles of the early twentieth century; however, it limits the project's ability to cover significant events that occurred both nationally and within the state in the decades that followed, including the effects of the Great Depression, the early years of the Masters golf tournament, President Franklin Roosevelt's numerous visits to Georgia, World War II, and the three governors controversy. Copyright law has also restricted the DLG's ability to digitize newspaper titles published by racial minorities in the state, because with a few notable exceptions (the Cherokee Phoenix and the Colored Tribune), most of those materials were produced and published in the mid-twentieth century.
Chronological Density and Completeness
Whenever possible, the DLG placed emphasis on selecting titles with large date spans in relation to page count. Mid-nineteenth century Georgia newspaper titles were often circulated weekly in a four page format. The Macon Telegraph, for example, published a weekly edition from 1826 to 1895 with four pages before the Civil War and eight pages after, which amounts to between two and three years of issues on a reel of microfilm. This kind of chronological density gives a digitization project more to offer users in an efficient way.
Daily newspapers published in the late nineteenth and early twentieth centuries are often less feasible options for immediate digitization, particularly if the desire exists to digitize a complete run of a newspaper title. When the DLG examined Savannah newspapers for possible digitization, chronological density was an immediate concern. The Savannah Morning News, currently the city's largest newspaper, accounts for nearly two hundred reels of microfilm covering daily issues between 1868 and 1922 in the Georgia Newspaper Project holdings. Organizers took the large reel count into consideration and eventually selected titles from earlier in the nineteenth century, including the Savannah Republican whose weekly publication between 1808 and 1865 amounted to less than fifty reels of microfilm.
As mentioned earlier, DLG also considered the completeness of a newspaper run when selecting titles for digitization. Newspapers can have gaps in their microfilm accessibility either due of a lack of availability of issues for microfilming or because portions of the title run were digitized by a commercial entity, making those issues unavailable due to copyright considerations. Luckily, the Georgia Newspaper Project's holdings are by and large comprehensive for cities and titles of historical and geographical significance. An exception is the Macon Telegraph online archive which has title gaps in the mid-1860s and early 1900s. Despite these interruptions in availability, the title was selected for publication due to its historical importance and user demand.
Repetition of Work
For obvious reasons, a major priority of the project was not to repeat the work of others. This consideration required the DLG to research the online newspaper landscape before coming to a final decision on which newspapers to digitize. Prior to the creation of the DLG's first newspaper archive, commercial entities had already digitized both the Atlanta Journal-Constitution and the Augusta Chronicle. For that reason, those two titles were avoided completely and the digitization of other titles from those two cities was delayed until titles representing some of the areas of the state with no digitized newspapers were added.
Initially, the DLG digitized newspapers one city at a time, and as a result, geographical distribution was not an immediate consideration. As the project addressed newspaper titles from some of the most populated and historically significant cities it became apparent that the southern portion of the state was being underserved. In the nineteenth century, South Georgia (aside from Savannah) was a rural and sparsely populated area of the state that was devoted almost solely to agricultural enterprises and this situation largely persists today. For this reason, newspaper journalism got a late start in the region and began to blossom in some of the larger cities by the mid to late 1850s.9 This left the DLG with few options for newspaper titles from the area with long chronological runs that could support an archive by themselves.
In response, organizers planned a regional newspaper archive to include newspapers from several South Georgia cities. This archive would have the benefit of including a comparable number of newspaper pages to other archives from larger cities farther north. It would also draw similar user interest by attracting researchers from cities over a large portion of the state. The venture has proved successful and has led to work on a North Georgia newspaper archive that will include titles from smaller cities and towns in the mountainous areas of the state. This regional approach has allowed the DLG to address newspaper selection with geographical distribution in mind.
Not surprisingly, the Atlanta archive is the most frequently used newspaper site in the Digital Library of Georgia, as Atlanta is both the capital of Georgia and its most populous city. The South Georgia archive is the next most used site due largely to its inclusion of newspaper titles from ten different cities. Those ten cities were added to the archive gradually, leading to constantly renewed interest in the site, which has undoubtedly boosted its visitation numbers. The Athens, Macon, and Milledgeville archives share similar numbers as the third most visited sites. Athens and Macon are among the largest cities in Georgia and share significant roles in the history of the state. Milledgeville is significantly smaller than the other two cities, but its unique history as the state's capital during the Civil War increases interest in its newspaper content.
The Columbus newspaper archive does not garner as much attention as the sites mentioned above. This low usage is particularly surprising, because Columbus is the second largest city in Georgia. Although no concrete evidence exists as to the reasons behind the site's low visitation numbers, it could relate to the city's inconspicuous role in the Civil War, which is of particular interest to Georgia researchers. Many of the DLG's other newspapers websites, including the Southern Israelite and Mercer Cluster archives, also have lower visitation numbers due largely to their specialized content.
User interest in the Savannah newspapers archive has yet to be determined. It has only recently been released to the public, so its visitation numbers are not yet comparable to the other sites. Early results, however, suggest that it will be among the most popular newspaper archives in the DLG. Furthermore, Savannah is one of the largest cities in the state and has a long and storied history as Georgia's first capital and primary port. For these reasons, it should be of particular interest to users going forward.
The newspaper digitization efforts of the DLG have been successful due in large part to the careful selection of newspaper titles. This process shares much in common with the procedures followed by organizations participating in the National Digital Newspaper Program, including factors related to content significance, copyright law, title completeness, and technical considerations related to microfilm. The selection approach differs in several key respects, however, including emphasis on user demand and supplemental funding. The selection criteria established by project planners at the DLG has helped the organization achieve the goals set forth by the initiative, but this procedure will continue to change and adapt as demand for newspaper digitization increases in the years to come.
The Digital Library of Georgia's newspaper digitization projects can be found online at http://dlg.galileo.usg.edu/MediaTypes/Newspapers.html.
1 Molly Kruckenberg. Plan For Selecting Newspapers To Be Digitized. 2009.
2 Ross Harvey. "Selection of Newspapers for Digitization and Preservation: A User Perspective," International Newspaper Librarianship for the 21st Century. Hartmut Walravens, ed. (Müchen: K. G. Saur, 2006).
3 Bart Ooghe and Dries Moreels. "Analysing Selection for Digitisation, Current Practices and Common Incentives." D-Lib Magazine. September/October, 2009. http://doi.org/10.1045/september2009-ooghe
4 Historical Newspapers Survey Report. GALILEO and Georgia HomePLACE. January 2009.
5 Ed Johnson, correspondence with the author, 27 August 2010.
6 Robert J. Wilson. "Milledgeville." New Georgia Encyclopedia. 6 December 2002; James C. Bonner. Milledgeville: Georgia Antebellum Capital (Athens: University of Georgia Press, 1978).
7 Carol Ebel. "Louisville." New Georgia Encyclopedia. 2005.
8 Peter B. Hirtle. Copyright Term and the Public Domain in the United States. 1 January 2014.
9 Louis Turner Griffith and John Erwin Talmadge. Georgia Journalism, 1763-1950 (Athens: University of Georgia Press, 1951).
About the Author
Donnie Summerlin is the Digital Projects Archivist at the Digital Library of Georgia, where he oversees the ongoing effort to digitize the state's historic newspapers. He has a B. A. in history from the University of Georgia, a M. A. in history from Georgia College & State University, a MLIS from Valdosta State University, and has been a certified archivist since 2010.