How to Build a Digital Librarian

Kirk Hastings and Roy Tennant
Digital Library Research & Development
The Library, University of California, Berkeley
Berkeley, CA 94720-6000
[email protected]
[email protected]

D-Lib Magazine, November 1996

ISSN 1082-9873

  • Abstract
  • Introduction
  • What We Set Out To Accomplish
  • How We Did It
  • What We Learned

  • Abstract

    Digital libraries require digital librarians. Digital librarians are required to select, acquire, organize, make accessible, and preserve digital collections. Digital services must be planned, implemented, and supported. Unfortunately, there are presently very few opportunities for librarians to receive training in the new tasks and responsibilities that digital libraries demand. To help meet the demand for retraining, the University of California (UC) Berkeley Library submitted a successful grant proposal to the Department of Education to support an Institute on Digital Library Development. The Institute trained thirty-six library, museum, and archive professionals in digital library issues and techniques. Major topic areas covered in the Institute included digital library theory and practice, HyperText Markup Language (HTML), imaging, optical character recognition, access and indexing, selection of materials for digitizing, and effective training techniques. Many of the instructional materials are available online.


    Digital libraries require digital librarians. Digital collections must be selected, acquired, organized, made accessible, and preserved. Digital services must be planned, implemented, and supported. Computers are certainly essential as the primary tools with which digital libraries are built, but people are required to put it all together and make it work.

    Although the broad requirements of digital libraries may be the same as with non-digital collections, any similarity ends there. Organizing a digital collection has very little in common with organizing a print collection in terms of day-to-day work and individual tasks that must be accomplished. Present day digital librarians find themselves doing almost nothing they learned in graduate school and very little that is familiar. Furthermore, the technology is advancing at such a rapid pace that what is learned today will soon be outdated. Therefore, it is more important that digital librarians possess particular personal qualities (which are innate) rather than specific technical expertise (which can be learned).

    Digital librarians must thrive on change. They should read constantly (but selectively) and experiment endlessly. They need to love learning, be able to self-teach, and be inclined to take risks. And they must have a keen sense of both the potentials and pitfalls of technology.

    Any individual with those qualities (or some measure of them) is an excellent candidate to forge new methods for accomplishing the age-old mission of libraries to select, acquire, organize, provide access to, and preserve the intellectual and artistic record of humanity. We are, after all, making it up as we go along. We need professionals who don't need a lot of guidance or hand-holding. We need individuals with imagination and foresight and the ability to make their vision a reality.

    Individuals with these qualities were who we targeted when the UC Berkeley Library wrote a successful grant proposal for funding from the Department of Education Higher Education Act, Title II-B. We understood the need to retool the profession with the skills and experience required to build libraries for the 21st century, since we were experiencing the struggle to retool ourselves. We also realized that opportunities to acquire these new skills in a formal instructional environment (rather than learning on the job) were rare. This article describes our goals, how we planned the resulting Institute on Digital Library Development, what we taught Institute participants, and what we learned about training digital librarians.

    What We Set Out To Accomplish

    The grant proposal[1] states that "the Institute will train participants in mounting electronic resources using the World-Wide Web, while also preparing them to train others. A product of the Institute will be an electronic archive of Institute materials that can be accessed over the Internet by both Institute participants and the world at large. Therefore, besides having an immediate impact on the attendees, this Institute has the potential for a lasting impact on the ability of libraries around the world to utilize existing technologies to expand access to electronic information."

    Our primary goal was to train information professionals in practical techniques that they could use to create new kinds of collections and services using current technologies. Our belief is that massive digital libraries will be built through the cooperative activities of numerous institutions. We wanted to help "seed" such activities by providing some key training and instruction in current digital library practices. We decided that our basic format should be a mixture of lecture, demonstration, hands-on exercises, and open lab periods. Lecture material would introduce digital library principles and techniques. Demonstrations would illustrate the lecture as well as highlight specific tools and techniques. Hands-on exercises would take participants through tasks that would help them learn the tools and techniques that were demonstrated. Open lab periods would allow the instructors time to work with participants individually, and would give participants time to pursue their individual projects.

    One of our requirements for a physical facility to host the Institute was a dedicated workstation for each participant. Our premise was that the only way we could expect attendees to learn the material was to let them do it themselves while experts were present to help with questions and problems. In the grant, we specified that we would train 36 individuals, which at the time would have required three separate institutes of 12 participants each. In the intervening period between the grant award and the Institute (over a year) the UCB Library created a new instructional facility with 18 student workstations, thereby allowing us to train the same number of librarians in two Institutes.

    The Department of Education awarded us a grant of $49,918, which comprised an estimated 87% of the total expense. The remaining 13% or $7,530 was provided by the UC Berkeley Library. Since we wanted to attract librarians from the entire spectrum of the profession, we requested $10,000 to provide attendee scholarships. This allowed us to underwrite the participation of librarians who would not have been able to attend otherwise, and thereby guarantee participation of those who would benefit most regardless of the amount of support their institutions could provide.

    The goal of creating "an electronic archive of Institute materials" has been fulfilled by the creation of the Institute on Digital Library Development Web site, which provides nearly all of the instructional materials developed for the Institute. Lecture material is available in both native Microsoft PowerPoint form as well as Adobe Acrobat. Instructor's notes and class exercises are available in Adobe Acrobat form. A complete list of documents handed out in class is there, with links to their electronic equivalents when available.

    How We Did It


    The Announcement and Application Process
    Given the nature of the Institute and the audience which we were targeting, we decided to only announce the Institute on electronic discussions appropriate to the type of training we were offering. We also specified that applicants must fill-out an application form on our Web site, thereby assuring that applicants would at least have some minimal familiarity with the basic tools we would be using in the Institute.

    Prior to the announcement of the Institute, we created an Institute Web site on the Berkeley Digital Library SunSITE that provided basic information about the Institute, a draft of the Institute syllabus, application information and form, and information about local arrangements.

    At the end of April 1996 we posted an announcement of the Institute on PACS-L, Web4Lib, and DIGLIB electronic discussions. The application deadline was May 17, 1996. We received over 100 applications from professionals in the library, museum, and archive communities working at corporate, public, and non-profit institutions around the United States (participation was limited to U.S. residents).

    We reviewed the applications and selected thirty-six participants and about half a dozen alternates in case of cancellation. In reviewing the applications, we realized that we had attracted a slightly different audience than we had originally envisioned. Most applicants had already used HTML to some degree, whereas we had thought that most would not yet have that experience. This information prompted us to revise the curriculum slightly. We decided to drop beginning HTML and send out a pre-Institute exercise to make sure that everyone would have the same base level of HTML knowledge. That allowed us to jump right into advanced HTML with a minimum of review.

    One of the UC Berkeley Library instructional rooms was scheduled for the entire week of each Institute. The room is equipped with eighteen student workstations (90Mz Pentium computers with 16MB RAM and a 17" monitor), and one similar instructor's workstation which is also attached to a projector. All workstations were running Windows 3.1 and application software included Adobe Photoshop (for image editing), OmniPage Professional (for optical character recognition), Netscape and NCSA Mosaic, and various Internet utilities (Telnet, FTP). All HTML editing was performed using a simple text editor (thereby allowing us to focus on the HTML rather than a particular software program).

    Since one of our goals was to maximize hands-on time for the students, we also brought in two other workstations (one Macintosh and one Windows 95 PC) and three scanners (attaching one scanner to the existing instructor's PC). We then had three scanning stations available throughout the week. To offer experience with digital cameras, we rented an Apple QuickTake digital camera during part of each Institute.

    The instructional room has both an inside and outside entrance. The outside entrance opens onto a patio area, which we used for serving break refreshments and lunch. This arrangement worked very well, as we did not need to allow time for participants to travel to a lunch location and back again. We could simply call everyone in from the patio when we were ready to resume.

    Instructional Aids
cover As we developed the content for the course, we realized we needed to create a binder (see cover illustration) that would contain reference material for further study as well as exercises and other materials that we would hand out in class. The Institute binder contained printouts of Web accessible documentation, selected portions of books and journals, and material developed for specifically for the Institute. We also purchased copies of the excellent Introduction to Imaging: Issues in Constructing an Image Database published by the Getty Art History Information Program.

    Also included in the binder were evaluation forms (one for each day and another for an overall evaluation), brief biographies of faculty, a list of Institute participants, and local information (restaurants, entertainment, etc.).

    During the Institute, we also handed out exercises, slide thumbnails of lecture material, and other instructional aids. Participants would add these to their binders as we went along.

    The Institute

    The first Institute was held July 15-19, 1996 and the second followed two weeks later, July 29-August 2, 1996. The content was equally divided between lecture, discussion, and hands-on practice. It was co-managed and taught by Kirk Hastings and Roy Tennant. Guest lecturers included Howard Besser, John Ober, Barclay Ogden, and Ann Swartzell. Other library staff members provided logistical, organizational, and systems assistance.

    Day 1
    We began the Institute with an informal meeting in the UC Berkeley Morrison Library, which offered a comfortable and notably low-tech environment in which to begin. Participants were checked in and they picked up their binders. Following self-introductions we talked about Institute logistics and what we intended to accomplish in the coming week. We set a tone of practical informality to encourage a "roll up your sleeves" atmosphere.

    We then made our way to the Instructional Computer Facility in Moffitt Library, where participants would spend more than eight hours a day for the remainder of the week. After a brief introduction to the workstations and the campus network, we began with an introduction to digital library development.

    TOPIC: Introduction to Digital Library Development

    Although most of the participants were professional librarians, they were mostly unfamiliar with the major issues in digital library development and the underlying structures and systems on which the digital library relies. We began with the difficult task of defining "digital library". By practical example, we made it obvious how slippery a concept this can be and how it was likely to continue to change and evolve until some basic standards are developed.

    The bulk of our comments were devoted to describing the primary activities of selecting, acquiring, organizing, providing access to and preserving information in the digital world. We covered both the broad issues involved and the technical infrastructure necessary to accomplish these tasks. We discussed how the practical skills taught in the Institute would help participants in the development of their own specific digital library projects.

    Also covered were the necessary physical components of a digital library, including hardware, software, staff and appropriate collections. Many participants were at the beginning of project development and were eager for practical advice in bringing together all these elements into a working whole. As was frequently the case, we did not have as much time for discussion as we would have liked.

    TOPIC: Introduction to Structured Text

    It was not our intention to teach even basic SGML. However, it was necessary for participants to understand the differences between SGML and HTML and what the appropriate uses for each are. We covered descriptive markup, document types, and data independence, and then went on to describe some of the issues involved in implementing a project based on SGML. As an example, we used the quite successful Berkeley Finding Aids project.

    Although it was a requirement for the Institute that all participants be familiar with the basics of HTML, we felt it would be a good idea to make sure that everybody was in the same place. After describing the relationship of HTML and SGML, we quickly covered the fundamentals of HTML markup and then reviewed a pre-workshop HTML exercise that all the participants were required to complete (or have equivalent experience).

    Day 2
    Day 2 was devoted entirely to learning the more advanced features of HTML. Roy Tennant has had considerable experience teaching HTML, and so had a series of well-honed lectures and exercises he has developed for teaching tables, forms, image mapping, and style and design. (Roy has collected these in a recently published book: Practical HTML: A Self-Paced Tutorial. The book versions of his presentations are available at:

    TOPIC:HTML Tables

    Because of the limited provisions for text formatting in HTML, tables have become the primary tool for changing the look and feel of web pages. We went into considerable detail on how to use the table tags to bring text and images into line. We then had the participants do a small practical exercise, based on a now infamous and overly-maligned Lemon Chicken recipe. Despite the careful selection of participants, it was inevitable that some would have a lot more experience than others. While those fairly new to markup struggled with this first attempt, the rest quickly elaborated on the theme and committed all sorts of atrocities.

    For the eighteen participants in each Institute, there were always two or more instructors available for help. Often this was not enough, as the questions came fast and furious. It was encouraging to see that those who finished easily, after a suitable period of creative expression, were soon helping those to whom this was a new universe.

    TOPIC:HTML Forms

    Near and dear to the hearts of front-line librarians is the idea that many of their most mundane tasks can be accomplished on-line. Many of the participants were thirsting to create forms for book renewal, patron complaint/suggestions, interlibrary lending, etc. After describing in detail the tags for creating forms, and doing our best to make obvious the differences between POST and GET, we turned them loose on an exercise to create a simple form. We had set up a simple script ahead of time on a Web server so that anything submitted using the exercise form would be posted on the instructor's workstation. For many participants, it was hearing us read out loud their predictably playful submissions that really brought home just how easy this all was.

    At this point, a certain deficiency in the Institute began to become obvious. Although it would be impossible to teach even the rudiments of Perl or some other scripting language in such a short time, it was obvious that we really needed to help them to gain a broad conceptual understanding of Common Gateway Interface (CGI) programming. If the Institute were to be repeated, we might include an exercise in creating a basic script along with enough of an overview of UNIX to facilitate putting it in the right place and seeing how it all works.

    TOPIC:HTML Image Mapping

    There seems to be a great fascination with the idea of image mapping. Many of the participants were interested in creating large image maps for navigating existing or potential sites. After introducing them to the basic tags, we demonstrated methods for determining the correct coordinates of image areas. As a segue to the next topic, we also discussed how the use of such gaudy features should be limited to those instances when they will be truly useful to the user.

    TOPIC:HTML Style and Design

    Anyone can throw together some HTML tags and stake their claim on a corner of the Web, but that doesn't make it right. Novice users of HTML often make flagrant design errors that can render their documents unusable if not downright annoying. To illustrate some important points about effective Web design, we used both participant- and instructor- identified Web sites as objects of ridicule or acclaim, as the individual case warranted.

    Day 3
    Days 3 and 4 were primarily devoted to the topic of digital imaging. By necessity, this section heavily emphasized practical learning experiences as there was much ground to cover. We also decided that the participants would get the most out the digital imaging exercises if they were project-oriented. By having a finite project to focus on, we felt that they would have an easier time in seeing in how all these skills could be brought together to produce a single product.

    TOPIC:Preservation & Access Criteria in Selection for Digitization

    We began this part of the institute by having librarians from here at Berkeley discuss the issues involved in selecting materials for digitization. Both Barclay Ogden and Anne Swartzell are involved in the development of large digital library projects. Their wealth of experience, combined with the fact that they are engaging speakers, soon had the class involved in a lively discussion of the issues that need consideration before a digital library project is initiated.

    TOPIC:Image Capture

    We began this section by having the participants define a small project they wished to work on for the remainder of the Institute. This was followed by a general discussion of digital imaging and the processes on which it is based. It proved very difficult to adequately explain resolution, dynamic range, file and image size, and how they all interrelate. Despite carefully prepared demonstration images, it was often only when we were working with people individually that we were able to get these basics across.

    After distilling these topics down into a few rules of thumb, we turned the participants loose on the scanners. Three scanning workstations were set up in the front of the class, and there was soon a line at each. Most of them had brought materials from their libraries to work on, ranging from botanical prints to slides to old newspapers. This proved very instructive, as many unforeseen issues soon cropped up with regard to how to best deal with materials other than an 81/2 by 11 sheet of paper.

    Learning scanning by doing.
    (Picture taken with an Apple QuickTake digital camera).

    TOPIC:Basic Photoshop

    Once we got through the trauma of transferring their multi-megabyte TIFF files from the scanning workstations to their own, we moved on to some basic image manipulation and enhancement skills. Almost none of the of the participants had any experience with Photoshop. The approach we tried was to simply take an image and run through as many features of the software as we possibly could in the time allotted. We emphasized those tools that are most useful for improving images without actually altering or losing any of the information contained in them: cropping, sharpening, adjusting color and tone, etc.

    Even before we had finished demonstrating, they were trying things out, often with disastrous and yet entertaining results. We were usually able to fix things, however, and soon everybody was mocking up web pages in which to place their images.

    In retrospect, a better approach to teaching Photoshop may have been to demonstrate the process of taking a raw TIFF format scan of a document or photograph and demonstrate the sequence of steps required to produce an acceptable image. By just dumping a bunch of tools in their laps, we left many participants in doubt of where exactly to begin.


    After class, many chose to spend most of the evening working on their projects. Having the facilities available to participants for extended use allowed them to make a lot of progress.

    Day 4
    Day 4 was devoted to adding more tools to their repertoire. Besides additional practice with Photoshop, participants were also introduced to optical character recognition and some of the subtleties of creating efficient images for the web.

    TOPIC:Advanced Image Enhancement in Photoshop

    Photoshop is one of those programs where there are at least a half dozen different ways to do anything. During this section, we built on the basic skills they had gained the previous day, by showing them more and increasingly sophisticated ways to manipulate and enhance an image. We still did not leave the realm of basic tools, preferring to leave such confusing issues as layers, channels and masks well enough alone.

    The participants were having none of it, however. After repeatedly showing people individually how to create things like drop shadows, we decided to at least demonstrate the use of layers, which can be an essential tool for image editing. Such techniques may go beyond the strict needs of the digital librarian, but people cannot resist the temptation to gussy things up. In the end, though, it helped the participants to better understand the software and what can be accomplished with it.

    TOPIC:Creating Images for the Web

    Many of the Institute participants had had prior frustrating experiences with placing images on their web pages. Striking a balance between image clarity and file size can be difficult. We began by describing the differences between the most useful file formats (TIFF, PhotoCD, JPEG, and GIF) and to what each was most suited. We then stepped them through the process of taking their initial scans and creating images for use on the web. This is a fairly simple process once you understand a few rules of thumb. We would also have liked to demonstrate, using digital library projects we had worked on, some of the choices we had made in particular situations, but at this point we were running out of time.

    TOPIC:Optical Character Recognition (OCR)

    Unfortunately, OCR ended up being crammed into whatever remaining time we had. Given the nature of the technology, we felt it necessary to point out that despite its great potential, text scanning is still a laborious and often frustrating experience. We quickly demonstrated the OmniPage Software and let people give it a try during the lab period. We would have liked to discuss some of the procedures we had come up with to improve results, especially with archival materials, but time did not permit.

    TOPIC:Image Projects & Issues

    We are fortunate on the Berkeley campus to have Howard Besser as a visiting professor. After the previous two days of intensive nuts and bolts work, participants were ready to sit back and have one of the field's heavy hitters describe what was happening on the cutting edge. Howard took us on a wide ranging tour, via the web, of some of the most innovative image database projects presently under construction. He also discussed standards development, ethical and legal issues, technical protection schemes, and content based retrieval. All the sites he demonstrated are available from his rather massive Web page of Image and Multimedia Database Resources.


    We offered another lab period at the end of Day 4. Again, many participants elected to stay and work on their projects during this period. Kirk Hastings staffed both labs and offered individual assistance.

    Day 5
    The last day was devoted to a mix of topics and wrap-up activities. In the morning, we covered proven techniques for teaching others, since one of the goals of the Institute was to have participants return to their organizations and spread what they learned among their colleagues. In the afternoon we presented material on access and indexing issues, and offered an open "show and tell" period in which participants could share what they accomplished during the course of the week. At the end of the day we allowed time to evaluate the day (as we had throughout the week) as well as the Institute overall.

    TOPIC:Effective Training Techniques

    John Ober modeled, demonstrated, and discussed training techniques that he has found to be effective in the many years in which he has been an instructor and trainer. He talked about why teaching technical topics is difficult and what training strategies can help. He discussed appropriate learning theories and how they can inform the decisions you make about what material to present and how to present it.

    TOPIC:Digital Library Access, Indexing and Databases

    In discussing digital library access issues, our essential point was "more is better." One path to your information is two too few. Think of the different audiences that your information may have and their particular needs. The good news is that the information itself (your digitized images, documents, etc.) need only exist once to be referenced from several different perspectives. We then demonstrated what we meant by using online examples of different paths to the same digital file.

    Our brief discussion of indexing focused on ways in which you could provide basic keyword searching of your files. As an example we discussed how we had used the free software SWISH (Simple WAIS Indexing System for Humans) to index some of our collections.

    Digital library databases are a complex topic, and unfortunately we did not have enough time to do the topic justice. We nonetheless spent some time discussing metadata and used existing library standards like AACR2 and MARC as well as emerging standards like the Dublin Core as examples. We discussed, but did not demonstrate, various database software packages that could be used and how Web access is being provided to them.

    Show and Tell of Participant Projects

    One of the most enjoyable parts of the Institute for us was seeing the results of everyone's work over the course of the week. It was truly remarkable what participants had been able to accomplish with what they had learned. The show and tell period was voluntary, but we were not above using threats to spur participants to share their accomplishments. One of the benefits of learning with a diverse group of individuals is the awareness of different perspectives and ideas that such diversity brings.

    The last part of the Institute was devoted to answering any lingering questions, assuring participants that despite their worst fears their heads had not, indeed, exploded, and collecting final evaluations. We also discussed methods of keeping up with technology and digital library issues.


    Screen Shot of McGreevy's Camp IDLD report

    Following the second Institute, we mounted almost all of the Institute instructional materials on the Institute Web site. Among the materials included are in-class handouts, Microsoft PowerPoint presentation files (in both native and Adobe Acrobat format), the lists of participants, a complete listing of the binder contents (with links to materials available online), and even instructor's notes (which were not handed out to Institute participants).

    Shortly after the Institute, Kathy McGreevy from Santa Rosa Community College Library in California created a Web page "report" of her experiences at the Institute she called "Skills I Learned at Camp IDLD". In this way, she not only reports on her experiences, but she also uses the techniques she learned at the Institute to create it (image editing, HTML tables, forms, etc.).

    A number of people have contacted us about offering the Institute again, but we have not yet made a decision. If we do offer the Institute again, or something like it, we will announce it to the same electronic discussions as we announced this one.

    Meanwhile, the Institute materials are available online for anyone to use in similar efforts and we can be contacted for questions. We also started an electronic discussion, DigLibns to carry forward our efforts to focus on practical, day-to-day issues and problems facing today's digital librarians. Anyone is welcome to subscribe and participation in the discussion.

    What We Learned

    We learned that teaching an institute is a great way to learn ourselves. There is no greater incentive to learn a topic thoroughly than having to teach it. We also learned a lot from interacting with all of the participants. Their needs, experiences, skills, and creativity were unique and added a great deal to the experience we have come to call "Camp IDLD".


    [1] Institute for Electronic Resource Development, grant application submitted to the U.S. Department of Education, Higher Education Act Title II-B, Library Education and Human Resource Development Program, November 28, 1994.

    Copyright © 1996 Kirk Hastings and Roy Tennant

    D-Lib Magazine |  Current Issue | Comments
    Previous Story | Next Story