D-Lib Magazine
The Magazine of Digital Library Research

T A B L E   O F   C O N T E N T S
J U L Y / A U G U S T   2 0 1 2
Volume 18, Number 7/8

ISSN: 1082-9873




Science, Publishing, and Digital Libraries
by Laurence Lannom, Corporation for National Research Initiatives


Special Issue on Mining Scientific Publications
Guest Editorial by Petr Knoth and Zdenek Zdrahal, KMi, The Open University and Andreas Juffinger, The European Library



TeamBeam — Meta-Data Extraction from Scientific Literature
Roman Kern, Graz University of Technology; Kris Jack and Maya Hristakeva, Mendeley Ltd.; Michael Granitzer, University of Passau

Abstract: An important aspect of the work of researchers as well as librarians is to manage collections of scientific literature. Social research networks, such as Mendeley and CiteULike, provide services that support this task. Meta-data plays an important role in providing services to retrieve and organise the articles. In such settings, meta-data is rarely explicitly provided, leading to the need for automatically extracting this valuable information. The TeamBeam algorithm analyses a scientific article and extracts structured meta-data, such as the title, journal name and abstract, as well as information about the article's authors (e.g. names, e-mail addresses, affiliations). The input of the algorithm is a set of blocks generated from the article text. A classification algorithm, which takes the sequence of the input into account, is then applied in two consecutive phases. In the evaluation of the algorithm, its performance is compared against two heuristics and three existing meta-data extraction systems. Three different data sets with varying characteristics are used to assess the quality of the extraction results. TeamBeam performs well under testing and compares favourably with existing approaches.

Semantic Enrichment of Scientific Publications and Metadata: Citation Analysis Through Contextual and Cognitive Analysis
Article by Marc Bertin and Iana Atanassova, STIH-LaLIC Laboratory, Paris-Sorbonne University

Abstract: The last several years have seen the emergence of digital libraries from which documents are harvested using the OAI-PMH protocol. Considering the volume of data provided by these repositories, we are interested in the exploitation of the full text content of scientific publications. Our aim is to bring new value to scientific publications by automatic extraction and semantic analysis. The identification of bibliographic references in texts makes it possible to localize specific text segments that carry linguistic markers in order to annotate a set of semantic categories related to citations. This work uses a categorization of surface linguistic markers organized in a linguistic ontology. The semantic annotations are used to enrich the document metadata and to provide new types of visualizations in an information retrieval context. We present the system architecture as well as some experimental results.

Domain-Independent Mining of Abstracts Using Indicator Phrases
Article by Ron Daniel, Jr., Elsevier Labs

Abstract: Abstracts contain a variety of domain-independent indicator phrases such as "These results suggest X" and "X remains unknown". Indicator phrases locate domain-specific key phrases (the Xs) and categorize them into potentially useful types such as research achievements and open problems. We hypothesized that such indicator phrases would allow reliable extraction of domain-specific information, in a variety of disciplines, using techniques with low computational burden. The low burden and domain-independence are major requirements for applications we are targeting. We report on an analysis of indicator phrases in a collection of 10,000 abstracts, and a more detailed analysis of the automated tagging of 100 abstracts from ten different disciplines. We found that a modest number (18) of regular expressions can achieve reasonable performance (F1 ~= 0.7, Precision ~= 0.8) in extracting information about achievements, problems, and applications across the 10 different disciplines.

Identification of User Facility Related Publications
Article by Robert M. Patton, Christopher G. Stahl, Thomas E. Potok, Jack C. Wells, Oak Ridge National Laboratory

Abstract: Scientific user facilities provide physical resources and technical support that enable scientists to conduct experiments or simulations pertinent to their respective research. One metric for evaluating the scientific value or impact of a facility is the number of publications by users as a direct result of using that facility. Unfortunately, for a variety of reasons, capturing accurate values for this metric proves time consuming and error-prone. This work describes a new approach that leverages automated browser technology combined with text analytics to reduce the time and error involved in identifying publications related to user facilities. With this approach, scientific user facilities gain more accurate measures of their impact as well as insight into policy revisions for user access.

Visual Search for Supporting Content Exploration in Large Document Collections
Article by Drahomira Herrmannova and Petr Knoth, KMi, The Open University

Abstract: In recent years a number of new approaches for visualising and browsing document collections have been developed. These approaches try to address the problems associated with the growing amounts of content available and the changing patterns in the way people interact with information. Users now demand better support for exploring document collections to discover connections, compare and contrast information. Although visual search interfaces have the potential to improve the user experience in exploring document collections compared to textual search interfaces, they have not yet become as popular among users. The reasons for this range from the design of such visual interfaces to the way these interfaces are implemented and used. In this paper we study these reasons and determine the factors that contribute to an improved visual browsing experience. Consequently, by taking these factors into account, we propose a novel visual search interface that improves exploratory search and the discovery of document relations. We explain our universal approach, and how it could be applied to any document collection, such as news articles, cultural heritage artifacts or research papers.

Extraction and Visualization of Technical Trend Information from Research Papers and Patents
Article by Satoshi Fukuda, Hidetsugu Nanba, Toshiyuki Takezawa, Hiroshima City University, Hiroshima, Japan

Abstract: To a researcher in a field with high industrial relevance, retrieving and analyzing research papers and patents are important aspects of assessing the scope of the field. Knowledge of the history and effects of the elemental technologies is important for understanding trends. We propose a method for automatically creating a technical trend map from both research papers and patents by focusing on the elemental (underlying) technologies and their effects. We constructed a method that can be used in any research field. To investigate the effectiveness of our method, we conducted an experiment using the data in the NTCIR-8 Workshop Patent Mining Task. The results of our experiment showed recall and precision scores of 0.254 and 0.496, respectively, for the analysis of research papers, and recall and precision scores of 0.455 and 0.507, respectively, for the analysis of patents. Those results indicate that our method for mapping technical trends is both useful and sound.

Specialized Research Datasets in the CiteSeerx Digital Library
Article by Sumit Bhatia, Cornelia Caragea, Hung-Hsuan Chen, Jian Wu, Pucktada Treeratpituk, Zhaohui Wu, Madian Khabsa, Prasenjit Mitra and C. Lee Giles, The Pennsylvania State University

Abstract: We provide an overview of some of the specialized datasets that were created for various projects related to the CiteSeerx digital library. These datasets are not those usually available from CiteSeerx and awareness of these datasets may further advance state-of-the-art research in academic digital library data management and analysis.

Automatic and Interactive Browsing Hierarchy Construction for Scientific Publication Collections
Article by Grace Hui Yang, Georgetown University

Abstract: Pre-constructed browsing hierarchies are often incapable of supplying the right set of terms to describe a new scientific publication collection. Even if a browsing hierarchy contains descriptive terms, they may not be organized in the same way as they are presented in the collection. Browsing hierarchies derived directly from the collection can be far more effective than pre-constructed ones. In this paper, we present a novel automatic browsing hierarchy construction algorithm which can derive browsing hierarchies that match the content of a collection of scientific publications. It also allows librarians or others who construct browsing hierarchies to interactively modify the hierarchies and, to some extent, teaches the algorithm to predict further human modifications. A user study and experimental results show that our algorithm is effective in creating hierarchies to support browsing activities for arbitrary collections.


N E W S   &   E V E N T S


In Brief: Short Items of Current Awareness

In the News: Recent Press Releases and Announcements

Clips & Pointers: Documents, Deadlines, Calls for Participation

Meetings, Conferences, Workshops: Calendar of Activities Associated with Digital Libraries Research and Technologies

F E A T U R E D   D I G I T A L


The Tapestries Called Sheldon


[William Sheldon's tomb at Beoley Church. Copyright Hilary L. Turner. Used with permission.]


[Honeysuckle. Copyright Hilary L. Turner. Used with permission.]


[Viola. Copyright Hilary L. Turner. Used with permission.]


The Tapestries Called Sheldon shows, stage by stage, how new evidence has changed the assumptions about the group of tapestries called Sheldon. Mostly small furnishing items rather than large wall hangings, they were so named only in the 1920s, following the discovery of the will (1570) of the English gentleman William Sheldon. It outlined a plan to introduce weaving of tapestry, arras and cloth fabrics at his manor house at Barcheston, Warwickshire, some one hundred miles from London.

There is no documentary evidence that the venture, unnoticed by contemporaries, was as successful as once thought. The stylistic characteristics which might define the workshop's style were derived by analogy from five tapestries found nearby and claimed as Barcheston work, again without any evidence. There is therefore no certainty that any piece bearing this label in fact originates with this enterprise. Themes seen in the greater number of tapestries now so called are heavily indebted to contemporary prints by Flemish and German artists and cannot, as formerly, be claimed as original designs.

More than one hundred tapestry weavers resident in London between 1559 and 1619 can be named. All were émigrés from the traditional continental tapestry weaving centres. Their presence, ignored in the 1920s, means that Barcheston can no longer be regarded as the only sixteenth-century English production centre and that therefore these tapestries might originate elsewhere.

The site displays more than fifty images together with a bibliography of dedicated research, a catalogue of known examples and places where tapestries called Sheldon can be seen. It presents biographies of tapestry weavers in England.


D - L I B   E D I T O R I A L   S T A F F

Laurence Lannom, Editor-in-Chief
Allison Powell, Associate Editor
Catherine Rey, Managing Editor
Bonita Wilson, Contributing Editor

  |   Mirror Sites  |  Export Citations: RIS or BibTeX
transparent image