Search D-Lib:
The Magazine of Digital Library Research

D-Lib Magazine

May/June 2017
Volume 23, Number 5/6
Table of Contents


Expanding the Librarian's Tech Toolbox: The "Digging Deeper, Reaching Further: Librarians Empowering Users to Mine the HathiTrust Digital Library" Project

Harriett Green and Eleanor Dickson
University of Illinois at Urbana-Champaign
{green19, dicksone} [at]



This paper provides an overview of the IMLS-funded project "Digging Deeper, Reaching Further: Librarians Empowering Users to Mine the HathiTrust Digital Library," and explains how the project team developed a curriculum and workshop series to train librarians on text mining approaches and tools, in order to address the recognized skills gap between the needs of researchers pursuing digital scholarship and the services that librarians are traditionally trained to provide.

Keywords: HathiTrust Digital Library Project, Text Mining


1 Introduction

The roles of librarians are transforming as a growing number of researchers and instructors integrate data into their work and scholarship. As the Association for Research Libraries' Strategic Thinking and Design Initiative Report predicts,

In 2033, the research library will have shifted from its role as a knowledge service provider within the university to become a collaborative partner within a rich and diverse learning and research ecosystem. [1]

This futurist declaration frames how librarians increasingly are encountering new research questions and scholarly needs oriented around data and digital technologies — needs that push the boundaries of current skillsets, knowledge, and service scope of librarians and archivists today. And recent initiatives such as the Library of Congress's "Collections As Data" forum and the IMLS-funded "Always Already Computational: Collections as Data" project recognize today's essential role of libraries and archives in providing and curating much of the data being used in this new, emergent research. In light of the "computational turn" [2] across the disciplines and in libraries themselves, how can libraries prepare for supporting data-driven research?

The Digging Deeper, Reaching Further: Libraries Empowering Users to Mine the HathiTrust Digital Library Resources (DDRF) project aims to develop and disseminate a curriculum for librarians to build competence in skills and tools for digital scholarship that they then can incorporate into research services at their home institutions.


2 Background

Digital scholarship centers and research commons are emerging in more and more libraries as part of revised service models to address the research needs for digital humanities and data-driven scholarship. Still, not all academic libraries have (or need) centralized services, and even when they do, librarians from many different departments in the library and areas of expertise are being drawn into digital scholarship support [3].

Studies document how these dynamic, data-driven changes in how scholars pursue research often involve deeper collaboration between librarian and disciplinary researchers [4], and what the Research Libraries UK's Re-Skilling for Research report called "a more proactive model of engagement with researchers." [5] Services such as research collaborations with faculty [6], building new models for scholarly communications and publishing in digital humanities [7], and offering tiered support services for digital scholarship projects encompassing digitization, multi-media publishing, and software development [8] are becoming increasingly standard in libraries. The recently published volumes Digital Humanities in the Library: Challenges and Opportunities for Subject Specialists [9] and Laying the Foundation: Digital Humanities in Academic Libraries [10] feature multiple case studies of new services and programs in academic libraries that address contemporary research needs in the area of digital humanities specifically.

But these rapidly growing areas of digital scholarship research, and the responding changes in library services and infrastructure, also highlight the key challenges that librarians face in gaining skills that enable them to engage with digital scholarship work [11]. Some centers have responded by offering training programs for librarians at their institutions to become more familiar with digital tools and methods. Notable efforts at the University of Maryland [12], Indiana University [13], and Columbia University Libraries' Developing Librarian program exemplify programs that re-skill librarians, especially subject librarians, to participate in new service models and the growing demand for digital scholarly support.

National and international initiatives to train those across the academy, from students and faculty to librarians, in strategies for incorporating digital methods and tools into research have proliferated in recent years. Programs such as the Humanities Intensive Learning and Teaching (HILT) institute prepare attendees, who include librarians, to engage in digitally-intensive research. Other recent professional development opportunities for librarians on topics in digital scholarship have included the Digital Humanities Institute for Mid-Career Librarians at the University of Rochester and the Data Science and Visualization Institute for Librarians at North Carolina State University, as well as the forthcoming the Association of Research Libraries' newly-launched Digital Scholarship Institute.

Our DDRF project aims to share and build upon the goals of many of these training initiatives, which are to address the recognized skills gap between the needs of scholarly research with computational tools and the services that librarians are traditionally trained to provide. Notably, these training initiatives for librarians employ a "train-the-trainer" model, by which librarians learn a new skillset that they, in turn, can introduce to local scholars. The newly released findings of the IMLS-funded Mapping the Landscapes: Continuing Education and Professional Development Needs for Libraries, Archives and Museums [14] attest in particular to the need for digital scholarship skills, as they note that of the core competency areas for professional development highlighted in their survey, "intermediate to advanced technology skills, digital collection management and digital preservation competency areas received the highest percentage of respondents indicating a need for significant improvement."

DDRF aims to empower librarians — especially those without local training programs — to become active in digital scholarship on their campuses. As such, our project seeks to build this capacity in support of the Institute for Museum and Library Services (IMLS) National Digital Platform initiative.

Funded by a 2015-2018 IMLS Laura Bush 21st Century Librarian grant award, DDRF is a partnership between five institutions: The University of Illinois at Urbana-Champaign, Indiana University Bloomington, Lafayette College, Northwestern University, and the University of North Carolina at Chapel Hill. Librarians and specialists from the partner institutions have been collaborating to develop a curriculum and training mechanism focused on preparing library and information professionals to engage in text analysis and core skills in supporting data-driven research. This project leverages the expertise of the HathiTrust Research Center jointly based between the University of Illinois at Urbana-Champaign and Indiana University Bloomington. Many of the hands-on activities and examples presented in the curriculum are drawn from the workshops, tools, and research services provided by HathiTrust Research Center for text analysis research [15]. The curriculum will be released as an open educational resource at the end of the grant.


3 Project update

We have drafted, delivered, and revised the initial version of the DDRF text analysis curriculum using an iterative instructional design process. Our process drew upon the inspiration and examples offered by other effective open training initiatives, including Software Carpentry, Data Carpentry, and Library Carpentry [16], as well as the New England Collaborative Data Management Curriculum [17]. The DDRF curriculum aims to be skills-oriented and centered on specific real-world use cases, as we describe later in the paper. The suite of teaching materials includes slide decks, instructor guides, and participant handouts. We continue to refine the materials after each iterated pilot workshop, with the aim of teaching the final curriculum at regional and national workshops across the U.S. during 2017 through 2018.

Through the pilot workshops, we have learned that the skill needs for librarians around digital scholarship are varied and individually-driven. The five project partners represent colleges and universities with diverse constituents and approaches to supporting digital scholarship. As such, each partner institution has encountered unique experiences teaching the same curriculum to their different audiences, which have ranged from cohorts of public services librarians working in undergraduate-central communities, to information science researchers and librarians at large research universities. The richness of this participant diversity has meant that the project partners are able to provide feedback on the efficacy of the training materials for different audiences. Our experience teaching the workshops to date has influenced our approach to instructional design and curriculum development, both of which have also been shaped by participant feedback through formal assessment.


3.1 Instructional design

The multistage instructional design process applied in this project began in fall 2015 with definitions of learning goals and objectives for the curriculum. This stage involved identifying the requisite skills and knowledge for librarians from different areas of expertise to support text analysis research, and how to build a training program that would address those requirements. This process established a benchmark for the curriculum that project partners were able to reference as the materials took shape. As a part of iterating on the teaching materials, we have refined the learning goals and objectives based on feedback and teaching experiences.

Our learning goals and objectives address librarian-specific competencies to engage with digital scholarship, and we developed them with the approach of seeing text analysis tools and methods as a digital scholarship service supported by the library. We do not expect for the learner to become an expert over the course of several hours, nor for the learner to necessarily formulate their own research project. Instead, we focus on fostering awareness of, and the ability to communicate about, key tools and methods in text analysis. Additionally, they map to five training modules that follow the text analysis workflow, from finding textual data to managing and analyzing it, which also align with key points at which a librarian might be involved in the research process (Table 1). Each module incorporates skills-based competencies that are developed through hands-on activities. A sample reference question that could be addressed using text analysis threads the modules and guides hands-on activities and discussion. Where appropriate, the activities align with HathiTrust Research Center tools and services.

Module Primary learning goal Skills developed
Introduction Understand what text analysis is and how scholars are using it in their research. Recognize research questions that may lend themselves to text analysis methods.
Gathering Textual Data Differentiate the various ways textual data can be acquired and evaluate textual data providers. Build a textual dataset and run a web scraping script.
Working with Textual Data Distinguish cleaning and/or manipulating data as a part of the text analysis workflow. Clean text data files using a Python script and/or OpenRefine.
Analyzing Textual Data Recognize the advantages and constraints of web-based text analysis tools and programming solutions. Run a web-based text analysis algorithm and extract token frequencies from a dataset.
Visualizing Textual Data Identify data visualization as a component of data-driven analysis. Practice exploratory data analysis using different tools for visualization.

Table 1: Learning and Skill-Building Goals for DDRF Curriculum

We chose to use a modular format for the curriculum, so that the workshops could be adjusted for different settings. Some modules have been further broken down into "beginner" and "advanced" lessons, improving the flexibility of the teaching materials. In the second round of pilot workshops, we found that the partner institutions were interested in rearranging the content to suit their audiences. Some chose to teach the modules in order from one to five, while others taught the beginner lessons of multiple modules before moving on to the advanced lessons.


3.2 Teaching

We have now taught several iterations of the curriculum via pilot workshops at each of the partner institutions. The workshops have been open to librarians, library paraprofessionals, and students in library and information science departments. We have seen strong interest in the workshops from across the library: for all of the fall 2016 workshops combined, 32% of attendees self-reported as reference librarians, 21% as technical services librarians, 21% as "other" types of librarians, and 16% as digital humanities or digital scholarship librarians.

Between each round of pilot workshops, the project team reviewed and updated the curriculum, based both on the attendees' evaluations and also in part on the experiences of the partner instructors teaching the materials. The feedback from the project partners has revealed that it can be challenging to learn, digest, and teach materials that others have developed. In such cases, instructors found it helpful to team-teach the workshop so that the instructor team was better able to grasp the materials and answer attendee questions. Making it easier for others to pick-up and teach the curriculum is one of our goals for the coming year. To this end, we are drafting in-depth instructor guides for each module that define vocabulary terms, outline the key points that should be addressed, and provide a slide-by-slide script from which the presenter can read.

An important component of our strategy thus far has been to limit technological barriers to participating in the workshop. The activities deployed in several of the modules involve the participants executing Python programs to complete a task. Properly setting up a programming environment can take considerable time, especially in a workshop setting and when using machines in a computer lab. When possible, we have explored web-based tools for programming, such as PythonAnywhere, that allow participants to complete activities no matter what their operating system and without configuring their computer. We came to this decision by evaluating our learning goals and determining what aspects of the code-based activity was most important to meeting our objectives. We determined that streamlining the technical activities through web-based programming platforms lowers the cognitive load of learning a new concept, and allows attendees to focus on what happens when they run a script as opposed to the nuances of their programming environment.

While we have attempted to simplify the steps to successful completion of each activity, we have also learned to value creative and critical thinking in the hands-on sections. After the first rounds of workshops, project partners reported that they wished there were more opportunities in the curriculum for open-ended inquiry. They also reflected on the importance of play and experimentation for those learning digital scholarship competencies. The first iterations of the activities were straightforward, and we are exploring ways to make them more playful as a means of reinforcing the concepts in the activities [18]. We have also incorporated discussion questions into the most recent version of the teaching materials. We hope such discussion will provoke critical reflection of the skills and competencies addressed in each module within the context of the learning goals, as well as provide space for attendees to connect the workshop's content to their own teaching and learning.


3.3 Assessment

Following each workshop, participants complete an assessment form. From the assessment feedback, the project team has been able to glean that librarian learners appreciate learning by doing, and that they prefer depth over breadth of content in a workshop.

Attendee feedback shows that librarians value experiential learning. Responses gathered in the assessment form often related to the hands-on activities. For example, one workshop attendee wrote that they were "intimidated" before coming to the workshop because it would teach programming concepts, but that "the structure of the workshop which allowed us to focus on the conceptual capabilities of using Python and scripts to do text mining was very useful and interesting." Additionally, others noted that there should be even more time devoted to skill-based learning. One wrote, "When I sign up for a workshop, I expect that most of the time will be actual hands on activities." Current work on the curricular materials is focused on further developing the scope of the hands-on sections of each module to allow learners the opportunity to understand the process happening in each activity, in addition to fostering experimentation and discourse as mentioned above.

Workshop feedback also reflected that early pilot workshops were too short relative to the amount of content we tried to teach. One attendee advised us to, "Make it longer, with more time for exploring data. [There is] not nearly enough time to really dig deep." We intend for future workshop sessions to be longer and anticipate they will be less rushed. We are also devising ways to create paths through the content for shorter workshops: By highlighting key points for each module in the aforementioned expanded instructor's guide, we aim for instructors to feel empowered to condense content as needed for use in abbreviated workshops.


4 Next steps and conclusion

We continue to incorporate the feedback and assessment received into our curricular and programmatic development of the project materials, and strive to keep in mind the various user groups and skills levels that librarians and information professionals have today. Given the initial response to our workshops, we know that our colleagues are actively seeking training and instruction in these emergent skillsets for digital scholarship and data science. The next year will see a series of regional and national workshops where we will present the curriculum to larger, more diverse audiences from across North America. Through these workshops, we will gather additional responses from the librarian community that will allow us to refine the curriculum into a final open educational resource.

Our project is motivated by the potential to build new and interactive communities of practice in libraries around digital scholarship. Library and information professionals today, across areas of expertise, must grapple with questions such as:

The more that libraries proactively equip their staff to engage in more data-intensive research and teaching — in addition to developing new spaces and service models — the richer the future looks for the changing role of libraries and archives in higher education.



[1] Association for Research Libraries. (2016). Strategic thinking and design initiative: Extended and updated report, Washington, DC: Association for Research Libraries.
[2] David Berry, D.M. (2011). The computational turn: Thinking about the digital humanities. Culture Machine 12.
[3] Mulligan, R. (2016). SPEC Kit 350: Supporting digital scholarship. Washington, DC: Association for Research Libraries.
[4] Green, H. E. (2014). Facilitating communities of practice in digital humanities: Librarian collaborations for research and training in text encoding. Library Quarterly (84) 2, 219-234.
[5] Auckland, M. (2012). Re-skilling for research: An investigation into the role and skills of subject and liaison librarians required to effectively support the evolving information needs of researchers. London: Research Libraries UK.
[6] Alexander, L., Case, B., Downing, K., Gomis, M. & Maslowski, E. (2014). Librarians and scholars: Partners in digital humanities. EduCause Review; Nowviskie, B. (2013). Skunks in the library: A path to production for scholarly R&D. Journal of Library Administration 53(1), 53-66.
[7] Coble, Z., Potvin, S., & Shirazi, R. (2014). Process as product: Scholarly communication experiments in digital humanities. Journal of Librarianship and Scholarly Communication 2(3), eP1137.
[8] Vinopal, J. & McCormick, M. (2013). Supporting digital scholarship in research libraries: Scalability and sustainability. Journal of Library Administration 53(1), 27-42.
[9] Hartsell-Gundy, A., Braunstein, L. & Golomb, L. eds. (2015). Digital humanities in the library: Challenges and opportunities for subject specialists. Chicago: Association for College and Research Libraries.
[10] Gilbert, H. & White, J. eds. (2016). Laying the foundation: Digital humanities in academic libraries. Lafayette, IN: Purdue University Press.
[11] Posner, M. (2013). No half measures: Overcoming challenges to doing digital humanities in the library. Journal of Library Administration 53(1), 43-52.
[12] Munoz, T. & Guiliano, J. (2014). Making digital humanities work. Digital Humanities 2014 Conference Abstracts EFPL-UNIL Lausanne, Switzerland 8-12 July 2014. 274-275.
[13] Courtney, A., M. Dalmau, & C. Minter. (2014). Research Now: Cross training for digital scholarship. Poster presented at 2014 DLF Forum.
[14] Drummond, C., Skinner, K., Pelayo, N., & Vukasinovic, C. (2016). Self identified library, archives, and museum professional development needs 2016 edition: Compendium of 2015-2016 Mapping the Landscapes project findings and data. Atlanta: Educopia Institute.
[15] Downie, J.S., Furlough, M., McDonald, R.H., Namachchivaya, B., Plale, B.A., & Unsworth, J. (2016). The HathiTrust Research Center: Exploring the full-text frontier. EduCause Review, May 2, 2016.
[16] Baker, J. et al., (2016). Library Carpentry: software skills training for library professionals. LIBER Quarterly. 26(3), 141-162.
[17] Lamar Soutter Library, University of Massachusetts Medical School. New England Collaborative Data Management Curriculum; Kafel, D., Creamer, A. T. & Martin, E. R. (2014). Building the New England Collaborative Data Management Curriculum. Journal of eScience Librarianship 3(1): e1066.
[18] For more about the concept of play in digital pedagogy, see Sample, M. (2016). Play. In Digital pedagogy in the humanities: Concepts, models, and experiments. New York: Modern Language Association.

About the Authors

Harriett Green is the interim Head of Scholarly Communication and Publishing, English and Digital Humanities Librarian, and associate professor, University Library, at the University of Illinois at Urbana-Champaign. Her research and publications focus on usability of digital humanities resources, digital pedagogy, digital publishing, and humanities data curation. She is Principal Investigator for the IMLS-funded "Digging Deeper, Reaching Further: Libraries Empowering Users to Mine the HathiTrust Digital Library" project.


Eleanor Dickson is the Visiting HathiTrust Research Center Digital Humanities Specialist at the University of Illinois at Urbana-Champaign. She supports outreach and training for the HathiTrust Research Center, as well as local digital humanities research at Illinois.