Carol Minton Morris
At a conference where ideas about the theory and practice of information engineering in digital libraries would be presented, it was particularly appropriate to meet in a city where both the practical and theoretical knowledge of engineering have played major roles. Conference Chair Ronald Larson introduced Dr. Robert Reagan who welcomed institutional and digital library project leaders who had crossed at least one bridge to attend the Joint Conference on Digital Libraries (JCDL) in downtown Pittsburgh, June 16-20, 2008. Reagan highlighted some little known facts excerpted from his recently released book co-edited with Tim Fabian entitled Bridges of Pittsburgh, which presents colorful details of the history surrounding the city's 446 bridges even more than in Venice that span the rivers where Pittsburgh nestles among hills and mountains.
Pittsburgh has emerged as the #1 Most Livable City in America (ranked by Places Rated Almanac) from a mixed history: In 1940 Frank Lloyd Wright suggested that the city be abandoned because of significant steel manufacturing pollution, while Andrew Carnegie founded a library knowledge empire that would spread 2.5 thousand libraries to 5 countries.
International influence was apparent in conference Co-chair Andreas Paepcke's analysis of content from the 117 full, and 61 short papers submitted for JCDL this year. Almost every continent was represented in the 28% of full and 34% of short papers accepted for this year's global conference.
Paepcke introduced Bill Buxton, Principal Scientist at Microsoft Research, Professor at the University of Toronto and principal in Buxton Design, who was the featured keynote speaker on day one of the conference.
Buxton wondered what Andrew Carnegie would think of the current nature of digital libraries. He believes that a sense of place is an inherent part of a bricks and mortar library where, in many communities, the library is one of the most significant public buildings. Carnegie's message was that 'architecture' mattered in the quest for, and preservation of, knowledge. Librarians were also an important part of the Carnegie library equation as 'human intermediaries' who were experts in specific collections or fields. Search mechanisms were library cards that contained a history of use, and notations, and were themselves historic objects. Buxton feels that much of this context has been lost in going digital. Search and browse can map to concepts of place and history, but in Buxton's view often fall short in modern digital libraries.
"The challenge of how to create use and access is not a technical problem. We must be vigilant in setting up social structures that support digital knowledge the way that Carnegie leveraged architecture, people and hands-on notation in support of scholarship," said Buxton.
As a designer Buxton believes that form follows function. If digital information models are not as complex and diverse as what we now have on paper, then the results will be impoverished. He feels that usability should be a basic right. The act of design can be reduced to, 'Design is choice.' Buxton believes that our children deserve a technology future that we have chosen for them, carefully.
On day two the plenary lecture entitled, "Shakespeare, God, and Lonely Hearts: Transforming Data Access With Many Eyes," was given by IBM Visual Communications researchers Fernanda Viegas and Martin Wattenberg. Both are new media artists who focus their work around data visualizations that allow anyone to "look at tons of numbers easily." They are currently transforming data access and analysis with a new web site called Many Eyes.
Viegas believes that visualizations can be more powerful when lay users begin to understand visualization and "the traces people leave online." Some of Viegas' work at MIT focused on "Getting a sense of the relationship you are carrying" by using text analysis and visualization of email archives. Her research project entitled Themail gathered archives from people who were guaranteed that their privacy would be protected. As it turned out the email contributors wanted to play with the visualizations even though the images often represented very private life events. The email analysis visualizations turned out to be social artifacts around which people wanted to gather for further conversation and reflection.
Wattenberg noted that tools like the Baby Name Wizard, a visual analysis tool that helps users assess the popularity of baby names over time, were being used by people who did not need to name a child or, in some cases, even like babies. People were interested in collaborating and building on each other's ideas and work. What was powerful was the number of people trying to make sense collectively. Viegas and Wattenberg wondered what might happen if the audience around visualizations was "scaled up."
At Many Eyes users create visualizations using their own, or provided data sets, discuss what they see, and share insights with their communities of friends, family and colleagues. Many Eyes encourages users to broadcast their data visualizations in other blogs and other online media by providing HTML snippets to add to web pages.
Viegas and Wattenberg have developed a system that enables data analysis as a distributed social process, because they believe that the more you understand about context and use, the richer the understanding of the data will become. For example, each time a user comments in the Many Eyes blog, the system adds a notation to the data visualization. The system has proven to be particularly interesting for word analysis of political speeches allowing users to "dive deeply and in a non-linear way."
The overall goal of Many Eyes goal is to multiply the impact of data. Viegas and Wattenberg believe that many more people can come to conclusions and form questions if they have tools to connect them to the meaning behind the data.
On June 19, Alex Szalay offered the final keynote address, "Scientific Publishing in the Era of Petabyte Data." He opened with a look at the evolution of science: 1,000 years ago science was empirical; during the last few hundred years, science was theoretical using models and generalizations; a computational branch emerged in the last few decades; and today, science is about data exploration.
Scientific data doubles every year, which has fundamentally changed the nature of scientific computing. Today scientific computing cuts across disciplines and has become unwieldy, making it more difficult to extract knowledge. He noted that 20% of the world's servers are feeding information to big data centers Google, MSN, Yahoo, Amazon, and Ebay so it's not only just about scientific data.
Szalay has been personally involved in the exponential growth of astronomy data from the late 1990s to 2008 due to his role with the Sloan Digital Sky Survey (SDSS) that has been "mapping the universe" as part of the Virtual Observatory activities for the last ten years. SDSS is now complete, and is in the process of developing the final data release. The completed SDSS archive will contain over 100 terabytes and will be managed by Johns Hopkins University. Sky Survey user sessions show a constant and increasing use of the SDSS data.
Szalay believes that scientific discoveries are made at the edges and boundaries of large data sets the places where you might not naturally be looking. The greater the number of connections that can be made among data sets, the more likely that something new will be discovered along the edges, suggesting data federation is significant.
One successful experiment in scaling out the solution for analysis came about because the Sloan Digital Sky Survey generated more data than scientists have time to study or classify, coupled with the fact that astronomy is attractive to the public. Astronomers asked citizens for help in classifying over a million galaxies by establishing the Galaxy Zoo.
This public science analysis solution has received enormous publicity and has allowed 100,000 citizens from all over the globe to contribute to discovery by helping to classify galaxies online while viewing beautiful images of unknown locations in the universe. For example, a German teacher found and called attention to an object that she had no experience in analyzing. Her observation turned out to be a significant discovery, the object that proved to be a Voowerp. Szalay believes that the educational impact of this work is enormous.
Technology plus sociology plus economics must come together in order to continue to work on how to preserve our intellectual data resources. Any one discipline alone is not enough to solve the data deluge problem. Both the promise and the unpredictability of increased participation in citizen science is yet another unknown. If there are thousands of new discoveries each day in public science is there any way to know how this will scale, or does this create a horrifying potential for even more data?
Best JCDL Papers
Catherine Marshall, Microsoft Research, received the 'Best Paper' award for her study of scholars' own perspectives on scholarly archiving. She mentioned that she had trouble getting to her slides in Microsoft Office 2007 as she showed a slide of a large "dust bunny" that she believes sums up scholars' general attitudes towards scholarly archives. In her view many scholarly materials are currently vulnerable. She suggests looking upstream to develop preservation methods closer to where notes and research findings are recorded for the first time.
Johan Bollen, Los Alamos National Laboratory, presented work focused on developing new modes of scholarly assessment based on networked systems' ability to record a great deal about access to materials more than traditional libraries can with the advantage that a scientific approach to scholarly assessment provides a great variety of metrics with which to measure access and use. Bollen suggested that the MESUR goal is to "Map science from the viewpoints of users."
The MESUR team led by Bollen obtained usage data for 1,000,000,000 usage events. The MESUR database has now exceeded that number. The networked usage map generated emerged from counting interactions in one user session and citations to journals.
Michael Christel is a Principal Investigator for the Carnegie Mellon Informedia Project, which aims to achieve machine understanding of video and film media. His JCDL presentation focused on understanding how digital video representations for a life oral history collection impacted user satisfaction. He observed that there has been an explosive growth of video as a digital asset, coupled with a three-orders-of-magnitude drop in costs for video storage over the past decade. The HistoryMakers, oral history corpus of 913 hours of interviews from 400 individuals, was used in the study.
A Few More JCDL Nuggets
The National Science Digital Library has been a functioning digital library since 2002 when, among other developments, the Nova Scotia Drama League agreed to transfer the "NSDL.org" domain name. During the JCDL 2008 Education session chaired by Richard Furuta on June 17, 2008, a series of presentations highlighted aspects of work with NSDL technical infrastructure and educational discipline communities. Later a panel of NSDL users and developers participated in a discussion entitled, "NSDL: Past, Present, Future" that focused on how NSDL has fulfilled its mission "to provide organized access to high quality resources and tools that support innovations in teaching and learning at all levels of science, technology, engineering, and mathematics education," and what direction NSDL will take in the future.
Dave McArthur credited Lee Zia, NSDL's NSF Program Officer, with guiding the project from it's inception in 2000 and reminded the audience that the NSDL was not always the way it looks now at NSDL.org. Basic infrastructure and technical standards were developed in the early start-up phase from about 2000-2003. NSF provided funding for collections and research. NSDL Core Integration and NSDL projects developed library services. The first iteration of a web site at NSDL.org was launched. "Getting the stuff up and running" was the task at hand and work with teachers and students was postponed as the library of educational resources continued to build. The NSDL library research track was significant as a continuation of Digital Libraries Initiative Phase 1 and Phase 2 (DLI1 and DLI2) research. About 40% of NSF-NSDL funded projects contributed to content, services and community building at NSDL.org in this early phase.
From 2004-2007 NSDL was refurbished and reorganized with the launch of new infrastructure based on Fedora open source repository software. The Collections track morphed into a new track called Pathways, which emphasized leveraging NSDL for different Science, Technology, Engineering and Mathematics (STEM) discipline communities. During this phase 70% of NSF-NSDL funded projects contributed directly to NSDL.org. Outreach and communications efforts increased as NSDL moved beyond prototyping and into classrooms. More projects went into the field and gathered data about how NSDL was being used.
Currently, using the tools and services that power NSDL, NCore technology and standards for creating a dynamic information layer on top of library resources, provides STEM education projects with a "grand and deep" opportunity for personalization. McArthur pointed out that a 'back of the envelope' calculation for implementing a specialized educational digital library system for a big urban school district might cost about $2M. Using NCore brings the cost down to about a quarter of that.
Kim Lightle has been associated with NSDL since 2000. "We view NSDL as a platform," said Lightle, "We have been able to do amazing things with very little money using NSDL technology and services."
She demonstrated examples from the NSDL Middle School Portal (MSP) that has been online for three years. Last month the MSP recorded 85,000 page views. "Using NSDL tools brings a high Google ranking to all content," Lightle explained. "We are in the business of providing context around significant science and math themes by harvesting, highlighting, and repackaging content in our Explore in Depth online publications." All 2,500 NSDL resources that have been augmented by MSP have been put back into the NSDL data repository so that others may benefit from additional resource information.
MSP hosts several Expert Voices blogs and features the RSS from their blogs on their web site: http://expertvoices.nsdl.org/connectingnews/; http://expertvoices.nsdl.org/middle-school-math-science/,and; http://expertvoices.nsdl.org/polar/. Some postings are 'quick takes' on topics that are noted on analysis of search logs they write short items based on what people are looking for in Expert Voices and use the RSS feed on their homepage.
Additionally NSDL outreach services and access to professional personnel have made it possible to create an entire online magazine Beyond Penguins and Polar Bears that would not have been otherwise possible financially or logistically.
Kathleen Koch explained that she has made use of the MS portal resources in the large, urban North Carolina school district where she helps coordinate curriculum development for teachers in 32 middle schools. Urban schools have many challenges providing support for young and new teachers, and achieving teacher retention are among them. At each grade level students learn integrated science and math. Teachers often lack content expertise. Koch, who is a classroom teacher, is trying to make their jobs easier by providing resources for instruction. Using district access to commercial Riverdeep2 education software, she links MSP resources. "Teachers think this is 'one-stop shopping,'" she says.
"Making learning relevant during a tricky age (early teen years) is key because they lose interest quickly," she said.
Dave Yaron, The Chem Collective, believes that digital libraries can help solve the problem of uninteresting and rote curricula at high school and undergraduate levels because digital libraries are cognitive tools that help novices construct expert mental models. By the time he retires, he would like to see some evolution with respect to changes in teaching and learning methods, particularly in chemistry education "from the bottom up".
Yaron believes that certain educational practices will increase the use of digital libraries in high school and undergraduate classrooms:
Video By the People, For the People
She conducted a study based on interviews and observations of 98 video-locating sessions. Her research group came up with 54 ethnographies that were based on assignments for a third-year HCI course. Their study direction was broad students could seek anything that they considered to be 'video.' Observation sessions could be wherever the students wanted them to be. She noted that the interviews and observations were conducted in New Zealand where Internet connections are slow.
Subjects most often went directly to You Tube looking for music, humor, movies, cars, sports, movies, and how-tos, followed far behind by every other Internet venue where users can access video content. She noted that finding video content turned out to be a highly social activity.
It's the Metadata, Stupid
The basic tagging process involves retrieving a resource, adding content to it, and putting it where it can be made available for additional public comment. Hunter explored how to manage the trade off between what tags have to offer against the quality of authoritative metadata.
She has developed an RDF model for annotation that allows for user-created tags that are sometimes relevant, yet also inconsistent, to be combined with complex metadata to create an optimum description and discovery process.
ORE Tutorial at JCDL
Carl Lagoze and Simeon Warner, both from Cornell University, focused on a well-understood scholarly use case in presenting the Open Archives Initiative Object Reuse and Exchange (ORE) during their June 16, 2008 tutorial. ORE is a method for describing aggregations of resources, using a resource map as a description of an aggregation that can be identified by its URI. Identified, bounded aggregations of related information units form a logical whole. An ORE resource map describes that bounded aggregation.
Carl Lagoze said, "It's this: even 120 years ago scholars needed to refer to external documents to understand scientific concepts. It [ORE] is really about getting materials out of a repository and evolving the web itself as a mash-up machine for scholars." ORE enables the many different parts that are necessary to understand scholarly ideas, found in many different places, to be aggregated. "Imagine site indexes as resource maps that could be indexed by Google," he continued. ORE Beta has been released and the specification is due out at the end of September 2008.
Attendees were treated to a preview of JCDL 2009. The conference is scheduled for June 15-19, 2009, in Austin, Texas, hosted by the School of Information at the University of Texas. Besides an increasingly international selection of papers, panels, keynotes and workshops the audience was reminded to expect nothing "small" from the state of Texas and that the city of Austin has one of the best live music scenes in the country sporting the unofficial yet well-known slogan, "Keep Austin Weird." The call for papers is now available from the conference web site: <http://www.jcdl2009.org/index.html>.
1. Photographs by Carol Minton Morris.
Copyright © 2008 Carol Minton Morris