D-Lib Magazine
July/August 1999
Volume 5 Number 7/8
ISSN 1082-9873
Digital Libraries Initiative - Phase 2
Fiscal Year 1999 Awards
Stephen M. Griffin
National Science Foundation
[email protected]
The following list contains performer and abstract information for awards made in Fiscal Year 1999 as part of the Digital Libraries Initiative - Phase 2 (DLI-2). Spring 1999 actions are listed first, followed by earlier awards made in Fall 1998.
The Digital Libraries Initiative - Phase 2 consists of 3 major components: the Research, Testbeds and Applications component (http://www.nsf.gov/cgi-bin/getpub?nsf9863); an evolving Undergraduate Emphasis component (http://www.nsf.gov/cgi-bin/getpub?nsf9863 plus updates at http://www.dli2.nsf.gov/under.html); and the International Digital Libraries Collaborative Research component (http://www.dli2.nsf.gov/intl.html).
There are no additional general calls for proposals planned at this time. Future competitions for special emphasis activities are anticipated as the Initiative progresses.
Review panels scheduled for this summer and early fall may result in additional actions in this fiscal year. However, awards from proposals received for the May 17 deadline will be determined in fiscal year 2000, which begins October 1, 1999.
More complete information on the program, funded projects, and related activities in the broader digital libraries community (including earlier and non-US efforts) can be found at the DLI-2 web site: < http://www.dli2.nsf.gov >.
The Digital Libraries Initiative - Phase 2 is an interagency program sponsored by:
- National Science Foundation (NSF)
- Defense Advanced Research Projects Agency (DARPA)
- National Library of Medicine (NLM)
- Library of Congress (LOC)
- National Endowment for the Humanities (NEH)
- National Aeronautics & Space Administration (NASA)
- Federal Bureau of Investigation (FBI)
In partnership with:
- Institute of Museum and Library Services (IMLS)
- Smithsonian Institution (SI)
- National Archives and Records Administration (NARA)
Within the NSF, the Initiative receives support from the Directorates for Computer and Information Science and Engineering; Social, Behavioral and Economic Sciences; and Education and Human Resources.
NSF serves as the administrative agent for the Initiative�s competitions and funded projects. Policy, planning and programmatic decisions are made by an interagency management group in which representatives of each sponsoring agency and the NSF directorate participate.
Spring 1999 Awards
A Patient Care Digital Library: Personalized Search and Summarization over Multimedia Information
Columbia University
Project Summary or Description: < http://www.cs.columbia.edu/diglib/PERSIVAL/#overview >
- Kathy McKeown, Principal Investigator
Computer Science Department- Shih-Fu Chang, Co-Principal Investigator
Department of Electrical Engineering- James J. Cimino, George Hripcsak, Co-Principal Investigators
Department of Medical Informatics- Judith L. Klavans, Co-Principal Investigator
Center for Research on Information AccessHealthcare consumers and providers both need quick and easy access to a wide range of online resources. The goal of this project is to provide personalized access to a distributed patient care digital library through the development of a system, PERSIVAL (PErsonalized Retrieval and Summarization of Image, Video And Language resources). PERSIVAL will tailor search, presentation, and summarization of online medical literature and consumer health information to the end user, whether patient or healthcare provider. PERSIVAL will utilize the secure online patient records available at Columbia Presbyterian Medical Center (CPMC) as a sophisticated, pre-existing user model that can aid in predicting user's information needs and interests. Key features of the proposed work include personalized access to distributed, multimedia resources available both locally and over the Internet, fusion of repetitive information and identification of conflicting information from multiple relevant sources, and presentation of information in concise multimedia summaries that cross-link images, video, and text. When the latest medical information is provided at the point of patient care, it can help practicing clinicians to avoid missed diagnoses and minimize impending complications. When expressed in understandable terms, it can empower patients to take charge of their healthcare.
Informedia-II: Integrated Video Information Extraction and Synthesis for Adaptive Presentation and Summarization from Distributed Libraries
Carnegie Mellon University
Project Summary or Description: < http://www.informedia.cs.cmu.edu/ >
- Howard D Wactlar, Principal Investigator
School of Computer Science- Takeo Kanade, Yihong Gong, Co-Principal Investigators
Robotics Institute- Christos Faloutsos, Alexander Hauptmann, Michael Christel, John Lafferty, Co-Principal Investigators
Computer Science Department- Yiming Yang, Co-Principal Investigator
Language Technology Institute & Computer Science DepartmentThe Informedia-II Project continues the pursuit of search and discovery in the video medium. This phase will transform the paradigm for accessing digital video libraries through meaningful, changeable overviews of video document sets, multimodal queries, and adaptive summarizations of very large amounts of video from heterogeneous distributed sources. Video information collages are the key technology in Informedia-II and will be built by advancing information visualization research to effectively deal with multiple video documents. A video information collage is a presentation of text, images, audio, and video derived from multiple video sources in order to summarize, provide context, and communicate aspects of the content for the originating set of sources. The collages to be investigated include chrono-collages emphasizing time, geo-collages emphasizing spatial relationships, and auto-documentaries which preserve video's temporal nature. Users will be able to interact with the video collages to generate multimodal queries across time, space, and sources.
Together with external partners, the project will also create an accessible, lasting digital video archive of historical, political and scientific relevance. Vast collections of video and audio recordings have captured the events of the last century, yet these remain a largely untapped resource of historical and scientific value.
The Alexandria Digital Earth Prototype (ADEPT)
University of California at Santa Barbara
Project Summary or Description: < http://www.alexandria.ucsb.edu/adept/overview.pdf >
- Terrence Smith, Principal Investigator
Computer Science Department, Geography Department- Mike Goodchild, Co-Principal Investigator
Geography Department- Anurag Acharya, Divyakant Agrawal, Co-Principal Investigators
Computer Science Department- James Frew, Co-Principal Investigator
- Donald Bren School of Environmental Science and Management
- Bangalore Manjunath, Co-Principal Investigator
Electrical and Computer Engineering Department- Richard Mayer, Co-Principal Investigator
Psychology Department- Christine Borgman, Co-Principal Investigator
Department of Information Studies, University of California at Los Angeles- Richard Lucier, Co-Principal Investigator
California Digital Library- Reagan Moore, Co-Principal Investigator
San Diego Supercomputer Center- Robert Nideffer, Co-Principal Investigator
Deparment of Sociology, University of California at Irvine- Amit Sheth, Co-Principal Investigator
Department of Computer Science, University of GeorgiaThis Project is a component of a collaboration between the University of California at Berkeley, the University of California at Santa Barbara, and Stanford University. The combined technologies will be demonstrated on the emerging California Digital Library (CDL), and on a testbed developed by the San Diego Supercomputer Center.
The Alexandria Digital Earth Prototype (ADEPT) Project will develop digital library environments and services that are based on the Digital Earth Metaphor. The services will support access to, and use of, heterogeneous digital information distributed across the Internet on the basis of georeference as well as other criteria. In particular, the system will support the construction and use of personalized digital information collections called Iscapes (Information Landscapes). A variety of services will be provided that allow Iscapes to be developed as information service layers in which diverse information resources can be organized, accessed, and used. A characteristic feature of Iscapes is the creation of special meta-information resources indicating the joint usability of the items in the personalized collections. The project will focus on developing services that support the construction and use of Iscapes in learning contexts and for the creation of knowledge across a range of disciplines, including the arts, humanities, and social, physical, and biological sciences. The Project will focus specific attention on evaluating the effect of ADEPT services on learning in undergraduate classroom situations.
Stanford Digital Libraries Technologies
- Hector Garcia-Molina, Principal Investigator
- Terry Winograd, Dan Boneh, Co-Principal Investigators
Department of Computer ScienceThis Project is a component of a collaboration between University of California at Berkeley, the University of California at Santa Barbara, and Stanford University. The combined technologies will be demonstrated on the emerging California Digital Library (CDL) and on a testbed developed by the San Diego Supercomputer Center.
The Stanford Project will continue to develop base technologies to overcome critical barriers to effective digital libraries. These include: heterogeneity of information and services; lack of powerful filtering mechanisms that let users find truly valuable information; insufficient availability of interfaces and tools that effectively operate on portable devices; and lack of a solid economic infrastructure that encourages providers to make information available and gives users privacy guarantees.
Re-inventing Scholarly Information Dissemination and Use
Project Summary or Description: < http://elib.cs.berkeley.edu/%7Ewilensky/dli-2.pdf >
- Robert Wilensky, Principal Investigator
- David Forsyth, Co-Principal Investigator
Computer Science Division, School of Information Management and SystemsThis Project is a component of a collaboration between the University of California at Berkeley, the University of California at Santa Barbara, and Stanford University. The combined technologies will be demonstrated on the emerging California Digital Library (CDL) and on a testbed developed by the San Diego Supercomputer Center.
The Project will attempt to develop tools and technologies that support highly improved models of information dissemination and access. A goal is to facilitate moving from the current centralized, discrete publishing model, to a distributed, continuous, self-publishing model, while at the same time preserving and enhancing the best aspects of the current model. In the envisioned model, information can be disseminated prior to publishing; it can be disseminated and composed continually; it will also have a significant non-textual data component. The model is consistent with the changing economics of academic publishing, yet has the potential to drastically alter the cost structure of scholarly information dissemination.
To promote such an improved paradigm, it is planned to (i) develop a set of enabling technologies, (ii) develop related technologies that exploit the paradigm to support functionality not readily available in the traditional model, (iii) experimentally develop publishing models and digital collections in line with the new paradigm, (iv) conduct studies on economic models of alternative information paradigms, and (v) conduct user studies to help evaluate the impact of the work.
An Operational Social Science Digital Data Library
Harvard University
Project Summary or Description: < http://www.dli2.nsf.gov/projects/harvardproposal.html >
- Sidney Verba and Gary King, Principal Investigators
Department of Government- Dale Flecker, Nancy M. Cline, Co-Principal Investigators
University Library- Micah Altman, Director and Co-Principal Investigator
This proposal is for developing a Virtual Data Center (VDC) for managing and sharing numerical social science data for teaching and research purposes across multiple institutions. This project will refine and extend the prototype data server developed by the Harvard-MIT Data Center and turn it into a free, portable software product that will integrate with other data centers and library databases by supporting a variety of communication and interoperation protocols.
The VDC will address some of the problems associated with electronic data including the length of time it can take to access online data-sets and the unavailability of the data that form the basis of many research publications. Data owners will be able to deposit data in many formats and set the terms of access to their data. Users will be able to search for and download data in many formats and will be able to request only the specific variables they need. The Center will provide access to both public domain and proprietary data and will be a launch pad to statistical data stored all over the world.
Security and Reliability in Component-based Digital Libraries
Cornell University
- Carl Lagoze, Principal Investigator
- Kenneth P. Birman, Fred B. Schneider, Co-Principal Investigators
Computer Science Department
- Anne Kenney, Sarah Thomas, Co-Principal Investigators
Cornell University LibraryBefore the advent of digital information, attention to information integrity was the charge of a number of institutions -- among them research libraries, publishers, and legal authorities. A major challenge in the digital age, and essential to the creation of digital libraries, is the creation of new mechanisms to ensure information integrity and new methods to administer those mechanisms. Information integrity has three major characteristics: 1) reliability, which ensures that information is available where and when people want it; 2) security, which protects both the privacy rights of users of information and the intellectual property rights of content creators; and 3) preservation, which ensures the longevity of intellectual content for use by future generations.
Failure to create these will inevitably threaten the viability of all institutions -- government, business, education, and defense -- that rely on digital technology for their mission-critical information resources.
The Cornell Digital Library Project will investigate and develop working prototypes of a digital library architecture with particular attention to supporting these integrity issues. The architecture will build on the notion of reusable components, which focus on the critical realities and benefits of the networked environment, global distribution, federation of content and services distributed among multiple administrative entities, and extension -- where new components and capabilities can be added to the architecture to suit community-specific requirements or in response to new technologies.
Founding a National Gallery of the Spoken Word
Michigan State University
Project Summary or Description: < http://www.ngsw.org/app.html >
- Mark Kornbluh, Principal Investigator
History Department- Jack Deller, Co-Principal Investigator
Department of Electrical and Computer Engineering- Joyce Grant, Co-Principal Investigator
Department of Teacher Education, College of Education- Michael Seadle, Co-Principal Investigator
Michigan State University Libraries- Douglas Greenberg, Co-Principal Investigator
Chicago Historical Society- John Hansen, Co-Principal Investigator
University of Colorado- Jerry Goldman, Co-Principal Investigator
From Thomas Edison's first cylinder recordings, to the voices of Babe Ruth and Florence Nightingale, and Studs Terkel's timeless interviews -- the National Gallery of the Spoken Word (NGSW) will preserve and, within the limits of copyright law, make these and other historically significant voice recordings freely available and easily accessible via the Internet. The NGSW will create a significant, fully searchable, online database of spoken word collections that span the 20th century. A collaborative project among the humanities, engineering, education and library science, this gallery will provide the first large-scale repository of its kind.
By identifying and digitally preserving crucial materials in voice libraries throughout the United States, the NGSW will provide storage for these digital holdings and public exhibit "space" for the most evocative collections, not unlike physical museums. However, unlike a physical museum, the NGSW faces no space limitations and never needs to rotate items out of the exhibited collection. All exhibits in the NGSW will remain on display permanently, freely available to all visitors.
This endeavor provides an important opportunity for research and education to suit a range of fields and interests. While much work has been done to develop better methods for preserving text and graphical images, many critical technical problems remain unsolved when it comes to digitally preserving sound and delivering it via the WWW. Analog versions of speech resources suffer from machine noise, copying distortion, background sound and deterioration. And while there are a number of search techniques that work well for written text, such tools do not yet exist for large-scale collections of spoken materials. The NGSW will address all these concerns. Participants in this project include researchers who are recognized leaders in the development of aural search capabilities. The NGSW will also create a repository of high quality digital versions of key spoken material with standard bibliographic and metadata access, while developing a set of best practices for future development of sound on the web, including methods for conversion, preservation, access, and copyright compliance.
A Digital Librariy for the Humanities
Tufts University
Project Summary or Description: < http://hydra.perseus.tufts.edu/Props/DLI2/dli2.html >
- Gregory Cane, Principal Investigator
Department of Classics- Robert Jacob, Co-Principal Investigator
Electrical Engineering and Computer Science Department- Holly Taylor, Co-Principal Investigator
Psychology Department- Ross Scaife, Co-Principal Investigator
Kentucky Classics, University of Kentucky- Nancy Allen, Co-Principal Investigator
Museum of Fine Arts, BostonThis project is focussing on developing the foundations of a scalable, broad-based, interdisciplinary digital library for the humanities. The principal investigators for this project include not only humanists but also specialists in computer-human interface design and in cognitive science. The goals will be both to improve the ways that humanists can perform their intellectual work and to design materials that are more accessible to the vastly expanded audience already reached by the World Wide Web. The Perseus Digital Library for the Humanities brings together specialists in the humanities, computer science, and cognitive science to research methods and structures for building interdisciplinary humanities documents into components of scalable, integrated digital libraries. The project team will study the effect of new electronic publications on a wide range of audiences, ranging from the general public to scholars conducting research. The Perseus Project (www.perseus.tufts.edu), an extensive digital library on Greco-Roman culture, will serve as a substantial laboratory for human-centered and technical research. Partners include the Max Planck Institute in Berlin, the Modern Language Association, the Museum of Fine Arts, Boston, and the Stoa electronic publishing consortium. Special collections at three libraries (Brandeis University, the University of Pennsylvania and Tufts University) will offer new content and allow development of new testbeds in areas that include ancient Egypt, the texts of Shakespeare, and 19th century London.
A Software and Data Library for Experiments, Simulations and Archiving
University of South Carolina
Project Summary or Description: < http://www.dli2.nsf.gov/narrdli2.pdf >
- David Willer, Principal Investigator
Department of Sociology- E. Elisabet Rutstrom, Co-Principal Investigator
Department of EconomicsThis proposal is to build, maintain and evaluate a software and data library for experiments, simulations, and archiving primarily for the social and economic sciences. It will serve as a "Web-Lab Library" and multi-functional knowledge center. There will be a library of software for experiments at the Website to support theoretically driven experimentation and a library of simulation programs for research and education. Data from current experiments will be recorded and automatically archived. The archiving format will be extensible to support inclusion of data from prior experiments. Innovative data retrieval and display systems will be developed.
The Web-Lab Library will be developed by a Hub at the University of South Carolina and two associated Collaboratories at the University of Iowa and Georgia State University. The Hub supports programmers with substantial knowledge and experience of social science research. The social scientists at the Hub and Collaboratories will develop designs for the Web-Lab Library. All will conduct experiments-at-a-distance to test software as it is developed
Digital Workflow Management: Lester S. Levy Collection of Sheet Music
Johns Hopkins University
- Sayeed Choudhury, Principal Investigator
- Cynthia Requardt, Co-Principal Investigator
Digital Knowledge CenterThis project will seek to enhance the use and usability of the Eisenhower Library�s Lester S. Levy Collection of Sheet Music and similar collections located elsewhere. The Eisenhower Library previously digitized this collection of more than 29,000 pieces of American popular sheet music spanning the years 1780 to 1960. The sheet music in this collection provides a social commentary on American life and a distinctive record of their time.
The project will create sound renditions and enhanced search capabilities for the collection. Audio files and full-text lyrics are being created using optical music recognition software written by staff from the Peabody Conservatory at Hopkins. Workflow managing tools will be developed to reduce and focus human labor. The activities will result in a tested process, framework, and set of tools transferable for use with other large-scale digitization projects.
A Multi-tiered Extensible Digital Archive of Folk Literature
University of California at Davis
Project Summary or Description: < http://philo.ucdavis.edu/SEFARAD/projdesc/projdesc.html >
- Samuel Armistead, Principal Investigator
Department of Spanish- Bruce Rosenstock, Co-Principal Investigator
Classics, Religious StudiesThe Armistead-Silverman collection at the University of California at Davis contains fifteen hundred "Judeo-Spanish" narrative ballads, together with other genres, including lyric poetry, folktales, proverbs, and riddles. The oral traditions preserved in the language also known as "Ladino" but called ``Judeo-Spanish'' in this grant proposal, were gathered by Professors Armistead, Katz, and Silverman during the years 1957-1980 from informants from Bosnia, Macedonia, Bulgaria, Greece, Turkey, Morocco, Israel, Spain, and the United States. This material is the largest collection of Judeo-Spanish oral literature in North America, and one of the three largest in the world. The Judeo-Spanish oral tradition preserves a cultural legacy for the study of Sephardic Jewry as well as for researchers in the history of pan-Hispanic and pan-European balladry. This oral tradition, with roots extending back into Middle Ages, provides a unique matrix within which Hispanic written literature was created.
The technical goals of the project are to continue conversion of this material to a multi-media digital corpus so that these materials can be made more widely available, with increased access analytic capabilities. Textual transcriptions will be tagged using a number of markup methods, especially XML, and a digital audio database will be created. A variety of approaches will be tested to make the archive fully extensible. The project will build on earlier research products from other digital libraries projects, including the University of California, Berkeley digital libraries group.
The Digital Atheneum: New techniques for restoring, searching, and editing humanities collections
University of Kentucky
Project Summary or Description: < http://www.dli2.nsf.gov/scu.pdf >
- William Brent Seales, Principal Investigator
- James N. Griffioen, Co-Principal Investigator
Department of Computer Science- Kevin S. Kiernan, Co-Principal Investigator
Department of EnglishThis work will develop new digital libraries from aging and damaged portions of the Cottonian Collection at the British Library, tailored to the requirements of scholars in the humanities. The result of this project will be state-of-the-art technical approaches, tools that incorporate those new approaches, and a widely distributed digital library of restored, previously inaccessible manuscripts. In particular, the technical focus will encompass the following important research areas:
- Continued development of new illumination techniques for damaged and aging manuscripts using novel lighting methods to make it possible to recover markings and information that would otherwise be invisible
- Creation of a semantic object model and framework for creating digital collections that will support domain or data-specific restoration and content-based search/access
- Incorporation of novel processing techniques for digitally restoring, enhancing, and searching/annotating manuscripts that have suffered damage from fire, water, and aging
The project has strong support from IBM through the Shared University Research (SUR). Likewise, partnership with the British Library provides priviledged access to high-quality collections, manuscript and curator expertise, and digitization facilities.
Data Provenance
University of Pennsylvania
Project Summary or Description: < http://db.cis.upenn.edu/%7Ewctan/DataProvenance/precis/ >
- Peter Buneman, Principal Investigator
- Val Tannen, Susan B. Davidson, Chris Overton, Co-Principal Investigators
Department of Computer and Information Science- Mark Liberman, Co-Principal Investigator
Department of LinguisticsThis project will address issues associated with data provenance. Provenance is concerned with how information has arrived at the form in which appears -- who produced it, who has corrected it, how old it is, how it was originally produced, and so forth. Understanding provenance has occupied scientists, historians, textual critics and other scholars for centuries.
The provenance of data in databases is a newer and larger problem, because one is interested in data at all levels of granularity -- from a single pixel in a digital image to a whole database. Just as scholars comment on documents by attaching annotations (marginalia) to text, part of the solution to recording provenance is the attachment of annotations to components of databases. Database researchers have recently considered loosely structured forms of data and have developed software systems for querying and storing such data. This work is closely related to new formats that have been developed for structured documents on the Web. It is expected that this technology will provide the substrate for recording and tracking provenance by advancing new data models, new query languages and new storage techniques.
DL of Vertebrate Morphology using a new High Resolution X-ray CT Scanning facility
University of Texas at Austin
- Timothy Rowe, Principal Investigator
Department of Geological SciencesThis project is an intensive application of high-resolution X-ray Computed Tomography scanning (X-ray CT) to the study of the vertebrate skeleton. These instruments are descendants of medical diagnostic CT scanners, and they enable the non-destructive inspection of tiny 3-dimensional objects in unprecedented detail. We will build an unprecedented digital library of high-resolution X-ray CT images and 3-D models. The library will enable far more detailed and comprehensive analyses of vertebrate structure than was ever before possible, by a global networked audience of researchers, educators, and students. We will examine the skeleton in all of its forms, from fossils to embryos and adults of living species. We will survey a broad taxonomic diversity that includes important laboratory and research species, and that samples the smallest four orders of vertebrate size-magnitudes.
We envision an interactive digital library that will accelerate education as it fosters fundamental new research discoveries in vertebrate structure, function, embryology, bioengineering, and evolution. The library core will be distributed over the Web. We will also expand our partnership with distinguished academic publishers of books and journals, to distribute selected high-resolution datasets on CD-ROM, via established peer-reviewed mechanisms that reach large professional societies and educational audiences. We believe that our prototype library design will be readily exportable across the community of engineers, physicians, and natural historians already using CT and other types of 3-D tomographic data.
This project is a collaboration among 24 researchers at leading research universities and natural history museums around the world. We believe that the digital library may eventually transform the study of vertebrate morphology. We expect it to foster fundamental new discoveries, accelerated communication and education, the formation of collaborations among widely distributed individuals, and new digital alliances among engineers, scientists, and publishers.
Using the Informedia Digital Video Library to Author Multimedia Material
Carnegie Mellon University
- Brad Myers, Principal Investigator
School of Computer ScienceThis project will create a comprehensive Intelligent Video Editor that will allow people without special training to author interesting compositions using digital video. In particular, the editor will support sophisticated interactive behaviors for the videos and for extra graphical drawings (called synthetic graphics) layered on top of the videos. For example, users might specify which objects in the video can be clicked on to choose the next video clip, or that an arrow should be drawn that shows the path that an object will follow, or that the video is part of a lesson and a viewer's answer to a question determines the next action. There will also be high-level facilities for searching and organizing videos, video editing, demonstrating behaviors, writing scripts in a more natural programming language, and testing and debugging the code. Children and their teachers will be able to create interesting interactive compositions using videos. The tools we create will be continuously tested with school children and adults to evaluate and refine the various features. The goal is to make it as easy to use the video material found in a digital library as it is to use textual material found in today's libraries.
High-Performance Digital Library Classification Systems: From Information Retrieval to Knowledge Management
University of Arizona
Project Summary or Description: < http://www.dli2.nsf.gov/projects/chen.pdf >
- Hsinchun Chen, Principal Investigator
- Robin Sewell, Co-Principal Investigator
Artificial Intelligence Lab, Department of Management of Information SystemsThe proposed research aims to develop an architecture and the associated techniques needed to automatically generate classification systems from large domain-specific textual collections and to unify them with manually created classification systems to assist in effective digital library retrieval and analysis. Both algorithmic developments and user evaluation in several sample domains will be conducted in this project. Scalable automatic clustering methods including Ward's clustering, multi-dimensional scaling, latent semantic indexing, and self-organizing map will be developed and compared. Most of these algorithms, which are computationally intensive, will be optimized based on the sparsity of common keywords in textual document representations. Using parallel, high-performance platforms as a time machine for simulation, we plan to parallellize and benchmark the above clustering algorithms for large-scale collections (on the order of millions of documents) in several domains. Results of these automatic classification systems will be represented using several novel hierarchical display methods.
The testbed of research will include three application domains that consist of both large-scale collections and existing classification systems: (1) medicine: CancerLit (700,000 cancer abstracts) and the NLM's UMLS (500,000 medical concepts), (2) geoscience: GeoRef and Petroleum Abstracts (800,000 abstracts) and Georef thesaurus (26,000 geoscience terms), and (3) Web application: a WWW collection (1.5M web pages) and the Yahoo! classification (20,000 categories). Medical subjects, geo scientists, and WWW search engine users will be used in the evaluation plan.
A Distributed Information Filtering System for Digital Libraries
Indiana University Bloomington
Project Summary or Description: < http://shakti.slis.indiana.edu/%7Ejm/nsf/nsfdl2.pdf >
- Mathew J Palakal, Principal Investigator
- Rajeev R. Raje and Snehasis Mukhopadhyay, Co-Principal Investigators
Department of Computer and Information Science- Javed Mostafa, Co-Principal Investigator
School of Library and Information ScienceThe popularity and the growth of the Internet and associated networking technologies are allowing a rapidly increasing number of users, representing diverse segments of the society, to access an enormous amount of geographically dispersed information available in different electronic form and media. With the successful completion of prominent efforts, such as the Digital Library Initiative, this volume of information will grow at a phenomenal rate. Without effective automated support systems to access and filter such information, an average user runs the risk of being overwhelmed by the sheer volume of irrelevant and possibly unwanted information. Unlike traditional information systems, digital libraries are inherently dynamic and distributed in nature. Providing a personalized, efficient, adaptive and intelligent access to this plethora of information, without creating an "information overload" on the users, is a major challenge right now, and will become increasingly urgent as we head into the next millennium.
The proposed research is aimed at designing and developing a distributed intelligent information distribution and filtering system that provides personalized information services to the user while minimizing direct user involvement. The system will weed out unwanted (irrelevant) incoming information and traverse the network to retrieve relevant information of interest to the user. The filtering system will be realized using a collaborative framework of a multitude of information agents, and will involve integration of advanced concepts and techniques from the domains of artificial intelligence, information retrieval, and distributed object computing.
Fall 1998 Award Actions
Automatic Reference Librarians for the World Wide Web
University of Washington
Project Summary or Description: < http://www.dli2.nsf.gov/projects/washdescript.pdf >
- Oren Etzioni, Principal Investigator
- Dan Weld, Co-Principal Investigator
Department of Computer ScienceBy all accounts, the Web is humanity's largest and fastest growing repository of digital information. Many collections of information are Internet-accessible, and most will provide a searchable Web interface. While some collections have a broad array of materials, trends show an explosion in the number of specialized collections with narrow but very deep content. Thus a principle challenge facing users will be the selection of Web information sources capable of answering their query. In a physical library, users rely on a reference librarian to help point them at the correct resource, but while human librarians are becoming increasingly sophisticated in their use of the Web, they are only part of the solution. We need more powerful automatic reference tools to help people efficiently retrieve high quality information from the Web.
Typically, reference librarians are not specialists in the topic of inquiry (e.g., computational fluid dynamics) but they are expert at identifying relevant resources (e.g., The International Journal of Fluid Dynamics) and at appropriate strategies for obtaining the necessary information. The central objective of this proposal is to create software agents that posses reference intelligence -- a limited understanding of complex technical topics, but a very sophisticated understanding of how and where to find high-quality information on the World Wide Web.
Tracking Footprints through a Medical Information Space: Computer Scientist-Physician Collaborative Study of Document Selection by Expert Problem Solvers
Project Summary or Description: < http://www.dli2.nsf.gov/projects/ohsu.pdf >
Oregon Health Sciences University
Oregon Graduate Institute of Science and Technology
- Paul Gorman, Principal Investigator
Biomedical Information Communication Center, Oregon Health Sciences University- David Maier, Lois Delcambre, Co-Principal Investigators
Department of Computer Science and Engineering, Oregon Graduate Institute of Science and TechnologyThe goal of this project is to help expert problem solvers find needed information in a large, complex information space. The focus is on one example of expert problem solving; the health care field. Sorting through such a heterogeneous collection of electronic and other media materials to find needed information, sometimes under time duress, can be formidable.
This project proposes to capture the trace of information used by experts -- to monitor the paths taken and collection resources used by, in this case physicians, in moving from observation, to information gathering, to solution of a given health care problem. By capturing the artifactual trace information associated with information seeking and selection, it is hypothesized that greater insight can be gained into behaviors of users and patterns of usage. This knowledge can then be fed back into the design and development of new information environments.
The work will be conducted by a cross-disciplinary team comprised of an MD focusing on information seeking behaviors of physicians, and a group of computer scientists focussing on extracting and using regulary structured information. The usefulness of the approaches will be tested in domains other than health care, in particular the aircraft design industry through the active support of the Boeing Corporation.
Image Filtering for Secure Distribution of Medical Information
Stanford University
Project Summary or Description: < http://www.dli2.nsf.gov/wiederhold.pdf >
- Gio Wiederhold, Principal Investigator
Department of Computer ScienceAn increasing amount of information being transmitted over the Internet is in image form. This trend includes medical images used in diagnosis and research, and other materials for which it is desirable to avoid violations of security and privacy. While privacy and security control of textual materials has long been a focus of research activities, images present new and more challenging problems. Filtering of images in addition to text becomes more essential as modern computing and communications facilitate the use of information in image form.
This project proposes to provide image filtering capabilities to complement other means of checking the contents of documents. The domain of interest is electronic medical records, but the research products are expected to be generalizable to other domains of interest. The effort will focus on developing further wavelet-based algorithms for searching medical image databases and retrieving relevant information from multimedia medical databases; extracting textual information from images; advancing practices for the protection of privacy and implementing a security mediator; and exploring WWW interfaces for security mediators.
Fall 1998 Undergraduate Emphasis Awards
Using the National Engineering Education Delivery System as the Foundation for Building a Test-Bed Digital Library for Science, Mathematics, Engineering and Technology Education
University of California Berkeley
Project Summary or Description: < http://www.dli2.nsf.gov/projects/berkeleydescribe.pdf >
- Alice Agogino, Principal Investigator
College of EngineeringTwo key National Science Foundation reports, "Systemic Engineering Education Reform: An Action Agenda" and "Shaping the Future: New Expectations for Undergraduate Education in Science, Mathematics, Engineering, and Technology," urge the formation of a national resource to provide access to quality courseware and to disseminate successful educational practices. Since the early 1990s, NEEDS -- the National Engineering Education Delivery System -- has provided these services for the engineering education community. Building on this base, this project will:
- Develop a test-bed digital library for science, mathematics, engineering, and technology education (SMETE). The library will provide courseware, cataloging, indexing, searching, and downloading to the science and mathematics communities.
- Initiate the development of a SMETE digital library community.
- Evaluate the test-bed library.
- Develop recommendations for the continued development of a SMETE digital library based on a needs assessment and test-bed evaluation.
Planning Grant for the Use of Digital Libraries in Undergraduate Learning in Science
Old Dominion University
Project Summary or Description: < http://www.dli2.nsf.gov/projects/odu.pdf >
- Kurt Maly, Principal Investigator
- Mohammed Zubair, Stewart Shen, Steven Zeil, Co-Principal Investigators
Department of Computer ScienceInstructional methods in academe are shifting from a teacher-centered paradigm to a user-centered paradigm. Advances in networking, digital libraries, and digital media technology are making the World Wide Web an effective framework for supporting this type of active learning. This project will develop a set of prototype tools, processes, and an environment to provide preliminary answers to a set of questions that underly the design and implementation of a digital library for science, mathematics, and engineering education. In particular, we will develop, run, collect data from, and analyze one student-centered computer science course. This project builds on experience with the Networked Computer Science Technical Report Library (NCSTRL) and work at Old Dominion University to develop NCSTRL+.
Virtual Skeletons in 3 Dimensions: The Digital Library as a Platform for Studying Web-Anatomical Form and Function
University of Texas at Austin
Project Summary or Description: < http://www.dli2.nsf.gov/projects/texas.html >
- John Kappelman
Department of AnthropologyRecent developments in three-dimensional digitizing hardware and software make it possible, practical, and economical to scan and archive complex-shaped objects, including a range of skeletal elements from a variety of large and small-sized species, into a digital library for study and research. Making anatomical materials, including elements from species commonly used in education and rare or even endangered species, widely available has far-reaching implications for research and for education from grade school through graduate school.
This project will begin the creation of such a library, starting with chimpanzees and baboons and using both low and high resolution technologies. It will also design and implement a "discovery interface" that will provide an interactive framework for investigation that will benefit both beginning and advanced users. The project builds on work at the University of Texas, Austin including the course Introduction to Physical Anthropology and Human Evolution and the CD-ROM Virtual Laboratories for Physical Anthropology.
Top | Contents
Search | Author Index | Title Index | Monthly Issues
Previous Story | Next Story
Home | E-mail the EditorD-Lib Magazine Access Terms and Conditions
DOI: 10.1045/july99-griffin