Articles
spacer

D-Lib Magazine
November 2001

Volume 7 Number 11

ISSN 1082-9873

DISA

Insights of an African Model for Digital Library Development

 

Dale Peters
DISA Project Manager, Campbell Collections
University of Natal
Durban, South Africa
[email protected]

Michele Pickover
Wits University
Johannesburg, South Africa
[email protected]

Red Line

spacer

Abstract

Digital technology is a driving force behind many of the changes occurring globally in higher education. The pilot digital imaging project in South Africa, DISA, has developed tools for capacity building to support new ways of learning and conducting research to meet a changing model of education. This article defines an approach to building digital research resources appropriate to the needs of the African continent. Through the development of an open source environment that supports international standards for the capture and storage of digital data, it is possible to create a sustainable infrastructure for a digital learning environment.

1. Background

The Digital Imaging Project of South Africa, DISA [1], developed from a workshop on digital imaging sponsored by the Andrew W. Mellon Foundation and held in Johannesburg, South Africa, in September 1997. The DISA Project has aimed to investigate and implement digital technologies to enable scholars and researchers from around the world to access South African material of high socio-political interest that would otherwise be difficult to locate and use. In addition, DISA aimed to provide South African archivists and librarians with knowledge of, and expertise in, digital imaging.

A small committee, representing research collections, universities, the National Library of South Africa and the National Archives of South Africa, consulted scholars on the selection of project content. The committee then wrote a DISA proposal, the title of which is South Africa's Struggle for Democracy: Anti-Apartheid Periodicals, 1960-1990.

The three key decades covered by the serial literature selected for the project encompass the growth of opposition to apartheid rule, a period when the African National Congress (ANC), black consciousness, and other resistance movements were very active. Approximately forty serial titles were selected with a view to presenting not only a wide spectrum of political views published during these years, but also a diversity of subjects, such as trade unions, religion, health, culture, and gender, pertinent to the social and political history of the era of anti-apartheid resistance.

2. DISA Project Challenges and Objectives

2.1 Challenges

Some of the publications selected for the digital imaging project were short-lived serials and had been produced "underground" due to political repression. Many of these were of poor typographic quality and their distribution was limited. Because of these factors, the publications have not been well represented in research collections, and their rarity has created an increased urgency for their preservation.

Some of these rare publications were collated electronically for the first time from the holdings of various national collections, and attempts have been made to locate missing issues from the private collections of individuals, some of whom were supporters of the struggle in exile. The digital environment has provided a public arena in support of democratic principles, and despite copyright concerns, the DISA project has received overwhelming support from publishing organizations once prohibited from distributing materials espousing these principles.

As a high-tech digital imaging project in the context of South Africa, the DISA project has needed to take issues of access into consideration. These access issues have arisen from the global socio-economic environment resulting in a digital divide between the rich-and-wired versus the poor-and-unconnected -- who in underdeveloped democracies often equate to the oppressed.

Since the transition to democracy, the need to redress past imbalances has resulted in a phenomenal growth in the number of those attending universities. At the University of Natal [2], for example, the number of students has increased from 13,000 to 25,000 since 1994. Increases in the number of Library staff and academic faculty numbers have not kept pace. Instead, restructuring of services has, perforce, drastically reduced staffing levels.

Furthermore, for reasons of financial constraint, 20% of the student population is located off-campus, and many of the off-campus students have never visited a library.

2.2 Objectives

Key objectives of the DISA project were to advance technologically, to build the capabilities of human resources, and to provide access to scholarly research materials on a sustainable basis. In addition, the DISA project aimed to address the digital divide by relying on open source software and calculated platform independence to provide a digital library model relevant to the African context and appropriate to developing countries.

3. Meeting the Objectives

3.1 Technological advancement

Among initial DISA objectives was the requirement that the project should be technologically advanced and should operate within internationally accepted standards. The nature of the materials selected for digital imaging indicated that the anticipated users would be a sophisticated scholarly audience with high expectations based on the increasing availability of electronic research resources.

Work processes focused on three areas: the scanning and Optical Character Recognition (OCR) of page images; XML mark-up and keyword indexing; and Web presentation. Page images are scanned at a high resolution of 600 dpi at 8-bit, scaled to 50% to reduce file size, and images are saved as uncompressed .tiff files for enduring archival quality. Threshold manipulation to 128 allows optimum image quality for OCR of original documents with uneven typographic quality. Derivative .gif files are reformatted to a 700-pixel width that does not require scrolling to view, and are linked as XREFS in the XML file.

The full text of the periodical pages, rendered by OCR, is inserted automatically into the flat XML file at each page break marked up as <p=ocr>. This process allows both metadata and full text to reside in the XML files, indexed by a search engine for fast access, from which the image files would be referenced and retrieved as appropriate.

Various software applications have been investigated to develop the initial browsing facility to offer full text searching of XML files. The scholarly user community the project serves identified the need to retrieve articles, possibly referenced in related bibliographic research. A common layout feature of the original corpora is the article progression over multiple non-consecutive pages, an idiosyncrasy that speaks of production pressure under political repression. A solution has been found in the integration of a customized front-end integrated with an off the shelf search engine [3] result in hits on articles -- not on pages -- but which would allow the user to navigate by page. The current development of XML software tools has delayed the implementation of a search facility, which is expected to be available by December 2001.

To extend the standard search features on author, title and date, also currently in development are more detailed search capabilities, such as searches on relevant acronyms as well as concepts derived from a subject-specific thesaurus. These are by-products of the DISA project that constitute an element of indigenous knowledge, adding value to the functionality of online retrieval over the value of the original paper-based resource. Instead of the traditional database structure, the application of the international standards of SGML, now rendered in XML, provides both indexing of standard features (author, title, date, keywords), as well as a full text search facility of a highly specific research resource on the Apartheid era.

The data reflected in 50,000 imaged pages is encapsulated in the metadata describing each journal issue, and the whole comprises a manageable 2,500 XML files that are both platform independent and highly portable. Proposed software developments of an XML base class in the Greenstone Digital Library [4] software suite are eagerly anticipated as a promising open source solution to collections management.

3.2 Capacity building

The cooperative structure of the DISA project provided an opportunity to share limited existing human resources, while developing new skills towards an appropriate digital library model, to respond to new ways of learning and research in a changing model of education.

As a suitable pilot to test feasibility within the three-year timeframe, the scale of the project was limited in size to 50,000 page images, with underlying text, marked up in XML.

After two years experience, interesting management issues have emerged that are specific to the local context. Most significant of these has been the slow response of institutions to the need for staff development. This concern has been addressed by way of ongoing training and consultation.

While the interest in staff development is high in South Africa, financial constraints in light of demands for basic library services in remote areas have limited the development of digital library skills to the tertiary education sector and national institutions. The demand for training from other African countries has been limited to the national libraries and national archives of Swaziland, Mozambique, Zimbabwe, Namibia, Zambia, Malawi, Tanzania, Kenya and Uganda. UNESCO [5] has provided support for this effort, and the Memory of the World Programme has initiated the Slave Trade Archives Project [6] to put into place a framework for skills development and necessary technical infrastructure in participating African countries. An initial training program has been conducted for representatives of the national archives of Ghana, Nigeria, Benin,Togo, Guinea Bissau, Senegal, and The Gambia.

Since 1999, DISA has coordinated three training workshops in South Africa. The workshops progressively have served to develop the role of digital technologies: first in digital conversion; then as a preservation management strategy; and most recently, in the development of guidelines in accordance with international standards.

Seeking to support the implementation of digital technologies, participants of a national workshop held in October 2001 recommended that the DISA project should become a legal entity for greater advocacy, mobilizing constituencies to lobby on national digitization issues including the formalization of training with the South African Qualifications Authority [7]. Additionally, working groups have been formed around areas of interest to develop guidelines for policy development, minimum technical requirements, adherence to standards, and best practice in the conversion of various media, including sound and audiovisual material.

Currently, a member of the DISA project team is engaged in postgraduate research to evaluate the training that has resulted from the DISA initiative in order to determine the relationship between skills development and general information literacy skills of information professionals [8]. Training workshops have computer literacy requirements that are indicative of the general competence level of attendees. Greater success has been experienced when graduate students have participated, with obvious management implications for the integration of project activities into the normal workflow. The need for internal experience to envisage the impact of implementing digital technologies has led to the development of a dual support structure made up of technical and managerial operation layers in partner institutions.

The involvement of a variety of institutions in the DISA project enabled the transfer of conversion skills to five remote capture sites located at partner institutions. DISA made available online a comprehensive set of guidelines that serves as a complementary medium of instruction. For reasons of uniformity, indexing and mark-up are conducted centrally, but future distribution of these processes has been identified as needed for skills development.

3.3 Sustainability

A further DISA project objective has been technical sustainability, with appropriate metadata supporting forward migration when new developments take place. Although it was acknowledged that this initiative was unlikely to be fully financially sustainable, it was to be developed to have minimum requirements and the capacity to generate income.

Collaborative ownership suggested a self-contained technical solution to test the feasibility of a Web-based product that would not require an open-ended commitment to ongoing maintenance by the partner institutions beyond the duration of the project. Income generation has been introduced in the writing of individual titles to a CD-Rom. This extension of a Web-based product simultaneously addresses the need for access in those areas in Africa where Internet connectivity is not assured.

3.4 Addressing the digital divide

Faced with formidable financial constraints and limited opportunities for the development of human resources, nevertheless South African libraries are in the process of changing the way they deliver information. Though their library services have traditionally served an educated elite, they now struggle to meet the burgeoning demand for free and open Internet access to essential resources for fundamental and lifelong learning. The development of DISA has been aimed at maintaining the highest intellectual and innovative quality expected in the academic environment. DISA's original vision of the network infrastructure development and increased bandwidth has been confirmed in the intervening years.

The project has set a standard for South Africa and leads the way in digital library development on the African continent. Although only 2% of the world's population of six billion are linked to the Internet, every African country now has a Web presence. This presence is expected to show dramatic future growth with wireless communication networking and the anticipated implementation of low orbit satellites to provide access to even the most remote areas. These advances potentially allow poorer countries to leapfrog whole stages of development and propel themselves into the information age.

The opportunity demonstrated in the DISA Project to negotiate strategic partnerships and to develop cooperative ventures in building digital collections of national importance is unprecedented in South Africa. It demonstrates clear advantages for the allocation of resources towards similar developments in other African countries. Digital technologies offer a new paradigm: preserving the original by providing access to the digital surrogate; separating the informational content from the physical medium; and liberating preservation management from the constraints of poor storage environments typical of the tropical and sub-tropical climates of the geographic region.

In the emerging global, knowledge-based economy, the issue of intellectual property rights plays a critical role in the relationship between various cultural interpretations of knowledge. Fortunately, the DISA project has experienced no pressure to relinquish intellectual property rights and strives to ensure that its image collection will be freely available to all South Africans.

Conclusions

Digital technology is a driving force behind many of the changes occurring in higher education. Distance learning demands online delivery of documents. DISA project planning was informed by a knowledgeable assessment of the value and informational content of the documents themselves, an understanding and appreciation for current and future users' needs, recognition of the country's potential technical capabilities, and commitment of partner institutions. A sober assessment of both the strengths and weaknesses of the current state of digital imaging technology in South Africa has resulted in a modest, but feasible, digital library model with due consideration to ethical issues and social context.

It is intended that the DISA project be extended in a series of other projects dealing with South Africa's fascinating social and political history. The extended projects would seek to provide access to further important collections of a proud cultural heritage, fulfill the social contract of a democratic future, and promote the development of digital library skills among information professions in South Africa and beyond its borders.

References

[1] DISA: Digital Imaging Project of South Africa, <http://disa.nu.ac.za>.

[2] University of Natal, <http://www.nu.ac.za>

[3] DtSearch search engine, <http://www.dtsearch.com>.

[4] Greenstone Digital Library, <http://www.greenstone.org/english/home.html>.

[5] UNESCO. Joint IFLA-ICA Committee for Preservation in Africa (JICPA), <http://epa-prema.net/jicpa/>

[6] UNESCO, "Memory of the World. Slave Trade Archive Project," <http://webworld.unesco.org/slave_quest/en/>.

[7] South African Qualifications Authority. National Qualifications Framework, <http://www.saqa.org.za/

[8] Simpson, G. S. 2001. "An evaluation of DISA as a digital training and skills development project in South Africa." An MLS research proposal. School of Human and Social Studies (Information Studies), University of Natal, Pietermaritzburg. Unpublished.

 

Copyright 2001 Dale Peters and Michele Pickover
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Back Issues
Previous article | Next article
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/november2001-peters