September/October 2016
Volume 22, Number 9/10
Guest Editorial

Current Research on Mining Scientific Publications

Drahomira Herrmannova and Petr Knoth
Knowledge Media Institute, The Open University, Milton Keynes, UK
{drahomira.herrmannova, petr.knoth}@open.ac.uk

DOI: 10.1045/september2016-guest-editorial


The articles in this issue of D-Lib Magazine were selected from papers presented at the 5th International Workshop on Mining Scientific Publications (WOSP 2016) organised by the Open University and OpenMinTed. The workshop was held in conjunction with the Joint Conference on Digital Libraries (JCDL 2016) in Newark, just outside of New York city. The workshop was organised by the Open University for the fifth time and featured a variety of speakers from academia and industry who presented their text and data mining research and results.

In a peer-review process the programme committee selected five long papers and four short papers to be part of this D-Lib issue. The papers can be divided into three general topics: semantic enrichment (two papers), tools and datasets (two papers) and citation analysis and research impact (five papers).

A significant proportion of papers presented at the workshop this year was concerned with citation analysis and research evaluation topics. The first two papers focus on analysing citations found in academic papers. While the paper Rhetorical Classification of Anchor Text for Citation Recommendation examines how to use citation context in order to increase relevance in citation recommendation, paper Temporal Properties of Recurring In-text References provides an analysis of in-text citations, particularly those appearing repeatedly. This session also included articles which look at how international mobility of researchers influences quality of university graduate programs (The Impact of Academic Mobility on the Quality of Graduate Programs), at whether curating papers stored in scholarly databases influences their citation rates (Preliminary Studies on the Impact of Literature Curation by Model Organism Databases on Article Citation Rates) and at the question of how to measure academic impact (Measuring Scientific Impact Beyond Citation Counts).

In the past, similar studies were limited by the ability of researchers to access and mine metadata and citation information of related to academic papers. Significant progress has been made in this area in recent years with both new datasets and new tools being released. The article An Analysis of the Microsoft Academic Graph demonstrated the strengths and limitations of the Microsoft Academic Graph dataset, which contains over 120 million metadata records of academic papers, and compared it to other available sources. A tool for crawling scientific repositories (Scraping Scientific Web Repositories: Challenges and Solutions for Automated Content Extraction) was also presented at the workshop.

The third workshop session featured approaches which address different problems in classifying and categorising the content of research publications. The work presented at the workshop focused on capturing novelty (Quantifying Conceptual Novelty in the Biomedical Literature) and interdisciplinarity (Capturing Interdisciplinarity from Academic Abstracts) from research publications.

We believe that the articles included in this special issue of D-Lib Magazine will help to motivate further research in this important domain. We hope readers will enjoy reading them and will find them useful.


About the Guest Editors

Drahomira Herrmannova is a Research Student at the Knowledge Media Institute, Open University, working under the supervision of Professor Zdenek Zdrahal and Dr Petr Knoth. Her research interests include bibliometrics, citation analysis, research evaluation and natural language processing. She completed her BS and MS degrees in Computer Science at Brno University of Technology, Czech Republic. Aside of her PhD she participated in research projects at the Knowledge Media Institute (CORE, OU Analyse).


Petr Knoth is a Senior Data Scientist at Mendeley, where he develops text-mining tools to help researchers' workflows. Dr Knoth is also the founder of the CORE system, which aggregates millions of open access publications from repositories and journals and makes them freely available for text-mining. Previously, as a researcher at the Open University, he acted as the principle investigator on a number of national and international research projects in the areas of Text Mining and Open Science.

