OAI7 - CERN Workshop on Innovations in Scholarly Communication

Search D-Lib:

D-Lib Magazine

November/December 2011
Volume 17, Number 11/12
Table of Contents

OAI7 — CERN Workshop on Innovations in Scholarly Communication

Paola Castellucci
Università di Roma "La Sapienza"
paoloa.castellucci@uniroma1.it

Elena Giglia
Università degli Studi di Torino
[email protected]

doi:10.1045/november2011-castellucci

Printer-friendly Version

Abstract

Since 1999 the CERN Workshop on Innovations in Scholarly Communication, known as the OAI conference series, has been an important resource for the international community involved in the Open Access movement. Librarians, researchers, administrators, and information scientists can all share experiences and opinions by attending the Plenary Sessions or the more specific and technical Tutorials. This was the format for the most recent of the OAI series (CERN Workshop on Innovations in Scholarly Communication — OAI7, University of Geneva, 22-24 June 2011). "Openness" was considered from many points of view, taking into account current research, best practices and technical innovations as well as economic and political aspects.

Introduction

CERN Workshop on Innovations in Scholarly Communication (best known as the "OAI series") takes place every other year. It is certainly one of the most renowned international meetings in the field of scholarly communication and offers a unique opportunity to learn and exchange ideas for those involved in the Open Access "philosophy": not only scholars and researchers, but also technical developers of Open Access (OA) repositories and services; research information policy developers; funding bodies; and publishers. The spirit of the meeting is sharing best practices, experiences and ideas, as well as connecting people and creating new possibilities for the achievement of "openness" within national and international institutions.

The latest event confirms this reputation. The quality of the OAI7 workshop was outstanding (programme and slides are available here). Plenary Sessions were both technically and politically oriented, as is evident just from the list of session titles:

Towards machine-actionable scholarly communication
Aggregation
Advocacy
Open Access Publishing
Open Science
Research data

Practical tutorials and breakout sessions meant for smaller groups also followed an interdisciplinary approach:

Memento & Open Annotation
Everything you always wanted to know about OA and OAI but were afraid to ask
CDS Invenio
Harvesters and subject based repositories / harvester
Creating and managing OA Journals with OJS
MarcXimiL: near duplicates detection (and similarity analysis)
Open Science
E-publishing research (quantitative and qualitative methods)
Next Generation OAI-PMH

It was proudly announced that this event had the largest number of participants ever, coming from all over the world. (See photographs). The common feeling shared by all is that the big change taking place in scholarly communication is inevitable. "Open Access should become the default mode in scholarly communication" is the tweet Paul Ayris [LIBER Ass., UK] quoted in his final remarks to OAI7/CERN Workshop on Innovations in Scholarly Communication, and we can certainly agree.

Means and models of sharing knowledge are changing, fostered by unprecedented technical developments (new kinds of "scientific articles", blogs, research social networks, mobile devices). Furthermore, there is a revolutionary perception as far as texts and data are concerned: exchange, reuse, mixing and matching, mining, aggregating, are the new keywords. Principles like transparency, accountability, public availability, social benefits, have recently been pushed further and supported by funding agencies, but there is no doubt this evolving context will continue to require a big effort, involving policies, copyright permissions, publishing platforms, search and retrieval tools, library services, and research evaluation criteria.

Advocacy

The big change concerns not just scientific communities and students, but also common users and citizens. New policies are needed, on a local, national and international level. That implies Advocacy, as Heather Joseph [SPARC, USA] stated: "Set the default to "Open". The invitation is particularly important considering the institutional role held by Joseph who is now negotiating the terms of the policies that should be accepted by national and federal agencies in order to promote Open Access. Joseph's efforts are directed towards obtaining a wider political awareness and better infrastructures for an open system of scholarly communication. Open should also include taxpayers as active members of the "big conversation". The OA movement is based on ethical values that support national research as a common good, stimulating social growth by means of innovative research projects. That must be reaffirmed where the decisions are taken. We must "take a seat at the table" not merely as accepted guests but as leading actors. Furthermore, it should be remembered that decision-makers are mainly economists. For that reason, words such as innovation, sustainability, positive effects on the global market, must be underlined. Sharing a common language will surely prove effective. It is necessary to formulate the argument in order to support institutional repositories as well as open licences for the reuse of documents and data.

Institutional policies have often been adopted at the end of a long and tough "conversation", as Monica Hammes [Univ. of Pretoria, South Africa] has illustrated. But achieving positive results has not interrupted the conversation itself. Every single stakeholder has been approached using an appropriate language: economic reasons (for founders and administrators); visibility and impact (for researchers and scholars); innovation (for librarians and technical staff). That is believed to be the correct approach for discussing central topics such as copyright legislation, sustainability and involvement of external founders. Practical questions have also been taken into consideration. Users tend to prefer other tools more than Institutional Repositories. Dorothea Salo offered a possible explanation, saying "what institutional repositories offer is not perceived to be useful, and what is perceived to be useful, institutional repositories do not offer". (Salo, D., "Innkeeper at the Roach Motel", Library Trends, 57, Fall 2008, pp. 98-123.) That is why the University of Pretoria created an Open Scholarship Unit implementing services largely considered "useful" by the researchers and well integrated into the ordinary workflow.

A similar perspective was offered by William Nixon [University of Glasgow, UK]. Nixon presented the institutional repository Enlighten and singled out the reasons for its success: plug-ins to the most used platforms (PubMed, Nature); authoritative control lists; experimental data mining techniques. Enlighten not only offers many searchable fields, e.g. funder's code at the level of metadata or frequently updated researchers' curricula and list of publications. It has also implemented business intelligence tools, performance indicators, metrics and statistics for the evaluation of the research. Finally, whereas Glasgow's CRIS (the institutional research register) manages administrative data, Enlighten deals with submitted works and services. The so called "3P integration" (People, Processes, Policies) is effective, as is shown by the current national research evaluation REF (Research Excellence Framework, 2014). The winning formula could be summarised in five points: continuous advocacy at every level; creating a network of human relationships; respecting the differences between disciplines; giving answers to real needs; and taking advantage of its previous experiences.

Towards machine-actionable scholarly communication

At the more technical sessions, Herbert Van de Sompel [Los Alamos Lab, USA] presented two current research projects:

Memento: a framework to potentiate a time machine for the Web. Memento (winner of 2010 Digital Preservation Award) envisions "a Web with a historical memory" by allowing links between different URIs of the same electronic resource at different times. In that way Memento manages the complex question of versioning.

Open Annotation Collaboration: an innovative project meant to enhance nanodocuments, a way of making notes on a document. Open Annotation is meant for "the two cultures", both the humanist and the scientist: it could be used for transcriptions in philological editions as well as for comments on raw data. Notes as a whole constitute a corpus and can be managed in a collaborative way using a Data Model.

Van de Sompel also coordinated the first plenary session, Towards machine-actionable scholarly communication, where new web site architectures were presented following the 12 "Rs": (De Roure)

Repeatable — run the experiment again.
Reproducible — enough information for an independent experiment to reproduce the results.
Reusable — use as part of new experiments. One experiment may call upon another.
Repurposable — reuse the pieces in a new experiment.
Reliable — to trust the Research Object we must be able to verify and validate it.
Referenceable — to have an identity, so we can cite it and ensure probity. Implicit in this may be versioning.
Re-interpretable — useful in and across different research communities.
Respectful and Respectable — with due attention to credit and attribution for the component parts and methods and their assembly, to the flow of intellectual property in generation of results, to data privacy, and with an effective definition of the policies for reuse.
Retrievable — if a Research Object can never be found it may as well not exist.
Replayable — a comprehensive record enables us to go back and see what happened.
Refreshable — updating a Research Object with ease when something changes.
Recoverable and reparable — when things go wrong we need automatic roll-back to retrace our steps.

(De Roure D., "Replacing the Paper: The Twelve Rs of the e-Research", Nature Blog, Nov 27, 2010)

Sean Bechhofer [Manchester Univ., UK] stressed that knowledge is lost during the paper publishing and text mining cycle, estimating that loss to be around 40% of the information over the life cycle of a document. On the contrary, OA principles emphasise dissemination and reuse. Bechhofer gave particular importance to keeping track of the process of the research, reusing and sharing workflows (e. g. My Experiment or Wf4ever). The prototype is based on a RO:Research Object, on shared files (DropBox) and, of course, on OAI-ORE. Bechhofer also claimed that Tim Berners-Lee's Linked Data would not guarantee the possibility of reuse, nor even reliability and repeatability. (Bechhofer et al. "Why Linked Data is not Enough for Scientists", Sixth IEEE e-Science Conference, 2010)

Reuse is not only for technical data. Jon Deering [Saint Louis Univ., USA] presented TPEN Transcription for Paleographical and Editorial Annotations, following the Open Annotation Collaboration (OAC) framework. The software manages manuscript transcriptions as images and provides line segmentation. The original document (image) is connected to transcriptions (text). The result is that images, comments and annotations are linked in a permanent but also open way.

Nanopublications, illustrated by Barend Mons [Netherlands BioInformatics Centre, NL] deals with the serious problem of managing huge amounts of data. The risk is that data could go unnoticed. Nanopublications is attempting to solve this problem by creating semantic triples (subject-predicate-object) which summarise concepts extracted from the original document. Triples, annotated in RDF format, are machine readable for text-mining and data-mining. Nanopublications can be cited, aggregated, and mapped, and they create new knowledge. (Mons, B., Velterop, J., Nanopublications in the e-science era, 2011)

Aggregation

One of the crucial issues in the open environment — "Make data work harder" — implies that Aggregation is not the goal in itself but the starting point. In the breakout session dedicated to upgrading and synchronising aggregated resources, Van de Sompel proposed a further development of OAI-PMH using tools such as RSS and Atom to ensure better integration between systems, services and resources. A "one size fits all" solution is impossible. Answers must be tailored to specific domains and needs.

Paul Walk [UKOLN, UK] focused on usability. His motto "Be open, usefully!" introduced his talk on providing services: aggregating and re-exposing; collecting from offline sources and making available online; harvesting and building services; rethinking the distributed computing framework and making it working better. It is also important to create a widespread feeling of trust. "Open" does not necessarily mean "permissively licensed". Walk argued for specific licences and for a developer-friendly attitude, encouraging the use of JSON instead of the verbose XML.

Niamh Brennan [Trinity College, IR] discussed RIAN, an Irish national portal aggregating research data (taken from CRIS) and more specific and disciplinary metadata. RIAN generates lists of publications and statistics. These ranking lists generate positive competition between universities. In addition, the "Live Traffic" usage statistics give authors data in real time. The degree of granularity is considerable: it is possible to search by grant number or by single researcher or institution, but it is also possible to expand the search to the general disciplinary domain. RIAN is successful because of its extremely intuitive interface. Important features are its managing system based on aggregated data vs. distributed administration, and its portfolio of services meant both for researchers and administrators.

OA publishing

Innovation plays a central role in OA publishing. PLoS — Public Library of Science represents an outstanding example. New platforms, new services, and new metrics demonstrate that quality and OA can certainly go together. For Mark Patterson [PLoS, UK], online media allow the process of scholarly communication to be reinvented and re-engineered. Statistics on the growth of the three main OA publishers (PLoS, BioMedCentral, Hindawi) speak for themselves, and Mikael Laakso recently published data and trends about the last decade. (Laakso M, et al. "The Development of Open Access Journal Publishing from 1993 to 2009". PLoS ONE, 2011, 6(6): e20961)

PLoS initially focused on sustainable OA publishing, but is now exploring new alternatives. For instance, PLoS ONE (over 1.5% of all the articles indexed in PubMed) is based on an editorial process that filters content a posteriori and not a priori. Traditional peer review ensures quality and consistency, but the actual value of an article lies in the comments, ratings, and notes, made after publication (in blogs, academic networks or social bookmarking tools). This is what "organising" the content after publication might mean; new metrics could define and measure the real impact, well beyond the debatable Impact Factor. "Enhancing" the content after publication is what PLoS Hubs are intended for. They are subject-based gateways, aggregating OA content and adding value by linking (e.g. PLoS Hub Biodiversity).

Salvatore Mele [CERN, CH] presented the final findings of the European project SOAP — Study of Open Access Publishing. SOAP revealed a situation much more complex than expected regarding business models and intellectual property rights management; 120,000 articles (roughly 8% of the annual production) are now published in OA journals at a ratio of 3:1 (science/technology/medicine to social science and humanities). Only 72% of these articles, however, have Creative Commons (CC) licences. Availability, sustainability and Impact Factor are counted as the most important factors in choosing an OA journal, though there are still many "barriers". Finding funds for the article processing fees (39%) and concerns about the quality (31%) are two important examples. Mele argued that if 89% of researchers positively recognise the benefits of OA journals but only 8% of them publish in OA journals, there is something wrong. OA is an alternative model but it has also created new barriers: the so called "author pays" logic (stressed by 39% of the surveyed researchers). It's worth working on that in the next few years.

Trying to outline the impact of Open Access practices on the scholarly communication system is precisely what the current European project PEER (Publishing and the Ecology of the European Research) is doing. Data are being collected on the effects of a massive self-archiving system. PEER is a joint project between traditional publishers, OA publishers, libraries and economic institutions. Christoph Bruch [Max Planck, D] and Barbara Kalumenos [STM, UK] spoke respectively from the point of view of information scientists and traditional publishers. They agreed about the main principles of OA; but revealed some disagreements about questions such as mandates and self-archiving.

Open science and research data

During the two sessions dedicated to the future of data-intensive science, the general message was to call for a fully and really "open" scholarly communication system. This requires significant investments in trustworthy and flexible infrastructure. Additionally, beyond the technical issues there is also a great need for cultural infrastructure that could support effective communication.

Legal tools and systems are required to ensure the practice of free reuse. Cameron Neylon [Science and Technology Facility Council, UK] emphasized that the cultural change towards an efficient use and reuse practice is the most challenging one. The principle of knowledge as a common, and ensuring the availability of OA content, together with technical and legal infrastructures, should create a climate of trust. In this scenario, OA tools (most importantly, Institutional Repositories) ought to be embedded in researchers' daily workflows. The JISC project DepositMO could be seen as good example of integration. The challenge now is to change the mindset and to foster repository deposit as a standard practice for a wider number of disciplines.

Mendeley, presented by its CEO and co-founder Victor Henning [Mendeley, UK] is another good example. It combines the features of a reference manager system with those of an academic social network. It creates bibliographies, manages digital archives, generates citations and connects the user with his own documents and data. In the logic of use and reuse, Mendeley automatically extracts data from articles, aggregates it, and offers the data via tag clouds. The highly positive result is that 90 million documents have been uploaded in the last two years. This is the best evidence that the supply of services responding to real demands is successful, even without a mandate.

Citizen cyberscience (the direct participation in science via the Web) is just as fascinating. It exploits and enhances the diverse competencies of millions of citizens in a long tail of projects. More and more leading researchers use citizen cyberscience systems in areas such as epidemiology, climate science and molecular biology. Millions of volunteers are contributing to those projects by donating time or participating directly in data analysis, or even by collecting data from the field with their smartphones. François Grey [Citizen Cyberscience Centre, CH] underlined how this epistemological approach shares with OA the same principles: inclusion and participation.

Linked Data is equally useful and innovating. It's a project of data connection created by Tim Berners-Lee within the framework of the Semantic Web, providing a standard access interface to data sharing. Anja Jentzsch [Freie Universität Berlin, D] detailed rules and technical details, based on a number of common standards (URI, http, RDF). The general philosophy of Linked Data is different from the proprietary interfaces of APIs that exclude hyperlinks between different APIs' data: "Web APIs slice the web into walled gardens". (See also: Linking Open Data, publishing existing open licence datasets. As of September 2010, it contained over 24.7 billion RDF triples and over 436 million RDF links between data sources. DBPedia, relying on Linked Data schema, extracts structured information from Wikipedia, publishes it under an open license on the Web, and generates links to other data sources).

As web search is evolving into query answering, search engines will increasingly rely on structured data extracted from the Web. An active role in creating a hub of the Web of Data could be played by librarians since they are the gatekeepers of a very large amount of structured data, often already standardised via interoperable systems. Cultural heritage institutions could be a major provider of authoritative datasets for the Web of Data, re-orienting the library perspective. The European Commission report Riding the wave: how Europe can gain from the rising tide of scientific data is clear evidence of the growing interest in this issue at all levels.

Open Data is a concept that fully meets the Digital Agenda for Europe. As Peter Wittenburg [Max Planck, NL] highlighted, the scholarly communication system is more and more data-intensive. But there are a lot of ambiguities in search of a balance: flexibility/reliability; quality/openness; local/global integration; affordability/high performance. A Collaborative Data Infrastructure (CDI) should be the architecture for the future, introducing new levels of responsibility. CDI implies three layers: researchers as data generators and users; research infrastructures offering disciplinary data services; and data-oriented e-Infrastructure offering common data services. The whole process should be sustained by the ecological principles of OA, avoiding useless redundancy. Data sharing is a pillar of competitiveness, but it needs investments and business models to ensure sustainable availability, long-lasting access and preservation. It also requires a suitable system of incentives and rewards in order to accelerate the expected cultural change. The 2030 vision forecasts that "Researchers and practitioners from any discipline are able to find, access and process the data they need. They can be confident in their ability to use and understand data and they can evaluate the degree to which the data can be trusted". (Riding the wave, p. 34). This could indeed be taken as the final message of OAI7.

Acknowledgements

We are deeply indebted to Alma Swan for her valuable assistance.

About the Authors

Paola Castellucci is Associate Professor of Documentation, Department of "Storia dell´arte e spettacolo", University of Rome, "La Sapienza". Her primary areas of research are digital humanities, the "two cultures", databases, and information retrieval. In her book Dall´ipertesto al Web. Storia culturale dell´informatica (Laterza, 2009) she advocates a cultural approach to the history of the Web, focusing on the evolution of the idea of textuality into ipertextuality. Ms. Castellucci also supports Open Access and translated the Budapest Declaration into Italian ("Nuovi Annali della Scuola Speciale per Archivisti e bibliotecari", 2010). She also published Letteratura dell´assenza (Bulzoni, 1992) and Un modo di stare al mondo. Italo Calvino e l´America (Adriatica, 1999).

Elena Giglia has been working in academic libraries since 1991, at the University of Milan and then at the University of Turin. She is now Open Access Project Leader at the University of Turin Library System. Ms. Giglia writes and lectures about Open Access, her main interests being the OA citation advantage and the OA economic sustainability.