Volume 19, Number 3/4
Table of Contents
Adding Value to Electronic Theses and Dissertations in Institutional Repositories1
Charles de Gaulle University Lille 3
Part of the grey literature, electronic theses and dissertations (ETDs) represent a growing segment of open, available content in institutional repositories (IR) where they contribute to the impact and ranking of their institution. More than half of all IRs listed in the Directory of Open Access Repositories contain ETDs. Most of these open access projects have similarities and common features, such as access to full text and compliance with the OAI metadata harvesting protocol. But more important are the differences, with regard to metadata, policy, access restrictions, representativeness, file format, status, quality and related services. In this paper, we investigate what can be done to improve the quality of content and service provision in an open environment, in order to increase impact, traffic and usage. Based on a review of 54 recent communications and articles on PhD theses in institutional repositories, this paper shows five ways in which institutions can add value to the deposit and dissemination of electronic theses and dissertations and describes two developments that are challenging institutional repositories.
Institutional repositories (IRs) are "tools (...) for collecting, storing and disseminating scholarly outputs within and without the institution" (Jain, 2011), in other words "a set of services (...) for the management and dissemination of digital materials created by the institution and its community members" (Lynch, 2003), i.e. they are open archives "serving the interests of faculty researchers and teachers by collecting their intellectual outputs for long-term access, preservation and management" (Carr et al., 2008). One of their main characteristics is their great diversity. Institutional repositories have different policies, procedures, functionalities, services and metadata; they have different business models and funding strategies (Swan & Awre, 2006), and their content may include more than current output from faculty. Smith (2008) details a "wide variety of materials in digital form, such as research journal articles, preprints and post-prints, digital versions of theses and dissertations, and administrative documents, course notes, or learning objects." Other repositories include datasets, multimedia or cultural and scientific heritage.
A steadily growing corpus of electronic theses and dissertations (ETDs) is also available through institutional repositories (IRs). At the end of 2012, the international directory of open archives OpenDOAR listed more than 1,100 institutional repositories with ETDs, representing roughly half of all registered open archives. The Registry of Open Access Repositories (ROAR) lists more than 200 services with 100% ETDs. Some people consider electronic PhD theses to be a Trojan horse that pushes institutions to launch open repositories, even without full compliance or mandate from the scientific community, mainly because they need a platform for preservation and diffusion of their theses. Following published journal articles in pre- or post-print formats, ETDs are the most important document type in open archives, significantly more important for instance than working papers, reports, book chapters or conference papers.
It is difficult to estimate how many ETDs are actually available through IRs. The Union Catalogue of the Networked Digital Library of Theses and Dissertations (NDLTD) contains more than one million records of ETDs. One segment of them in fact, probably the greatest is in IRs. This is good. This is clearly a success of the green road access to scientific information (Harnad et al., 2008), and in our case, to scientific grey literature (Schöpfel, 2010). But open is not enough. In order to increase acceptance, access and impact, it is not enough to simply upload content on servers. There must be at least a minimum of added value (Schöpfel et al., 2011). So our questions are: How can an institution add value to ETDs in its IR? What can be done to improve quality of content and service provision in an open environment?
This paper is based on the state-of-the-art of scientific and professional literature on PhD theses in institutional repositories. In particular, it reviews recent communications of thirteen conferences on electronic theses and grey literature:
- ETD: Six international symposia on electronic theses and dissertations from 2007 to 2012, organized by the Networked Digital Library of Theses and Dissertations (NDLTD).
- USETDA: Two national conferences 2010 and 2011 of the US ETD Association (part of NDLTD).
- GL: Five international conferences on grey literature 2007 to 2011, organized by the Grey Literature Network Service (GreyNet).
The review includes other published studies on open access and grey literature and case studies on ETDs in IRs, identified via Google Scholar, Scirus and the EmeraldInsight platform. The corpus of reviewed papers contains 54 references2.
Based on this literature review, we came to understand which features make a difference. The reviewed papers mostly case studies present good practices, success stories, challenges and problems. But what are best practices? Which aspects are critical to distinguishing excellent IRs from good ones, and that add value to the dissemination of ETDs?
Results: Five ways to add value to PhD theses
The reviewed sample provides very different solutions, from various countries, institutions and environments. National open access networks like the Peruvian Network of Digital Theses (Huaroto, 2008) cohabit with a central 'hub' or single point access interface to all kinds of theses, such as the British Library EThOS service (Gould & Rosie, 2012); local ETD programs, as on the Emory campus at Atlanta (Halbert, 2007); large document servers such as e-doc at the Humboldt university of Berlin (Schirmbacher, 2009); and international solutions for repositories like the European ETD portal DART (Moyle, 2009). All of these solutions in our sample have two features in common: they are (or give access to) open institutional repositories, and they contain electronic theses. Yet, when they are compared to each other, one can identify five specific characteristics that make a noticeable difference, and provide scientific excellence.
The first feature is quality of content. "The content (...) is the most important factor that has been cited by researchers to show the success of a repository" (Macha & de Jager, 2011). A good IR not only defines a set of standards and criteria for the selection and validation of deposits but also communicates and promotes this editorial policy. For ETDs, this means only validated versions, with minor or major revisions if required by the PhD jury, but without preprints or non-controlled self-deposits. This also means that the IR is either part of the local ETD workflow or that it includes a point of formal and institutional validation during the deposit process.
Independent of the institutional validation, and in addition, the IR standards could hold some minimal requirements for acceptance, such as unrestricted access to full text, copyright clearance, or limit acceptance to research theses only. A specific review system can be helpful for quality assurance. The same selective quality criteria should apply to retrospective digitization of print theses, when the IR contains both born-digital and older repurposed digital materials. In networked systems such as the European DART, minimal quality standards are vital for the global service.
A second feature that makes a difference is the description of the content and context of the ETD files, e.g. metadata. "Considerable variations in practice exist (...). Many of these variations involve local qualifiers and/or the use of metadata elements to record information that does not correspond to the prescribed meaning of the elements" (Park & Richard, 2011). A good and rich bibliographic description increases the searchability of deposited items and allows for scientometric analysis, book publishing (print on demand) and other functionalities and services.
Generally said, metadata of ETDs in IRs should be rich and standard, and the use of standards should be promoted by recommendations (good practice). Metadata standards are especially important for small and medium-sized institutions and in networks such as DART. The DART network applies the simplified version of the OAI Dublin Core as the basic format and for use as a kind of backbone for all members, while the French national ETD infrastructure chose the more sophisticated TEF (Ducloy et al., 2006) format. Other metadata standards are the NDLTD standards for DC/MARC (ETD-MS), the ProQuest schema UMI XML DTD (Marsh & McLean, 2008) and the Czech EVSKP-MS. Whichever metadata schema is implemented, it should remain flexible, facilitate easy conversion to other formats and must change with new policies (Heyse, 2007).
Good institutional repositories contain full text, not just metadata. They offer different file formats for different usage and purpose. Deposit formats should be searchable, open, and appropriate for long-term preservation and intelligent exploitation of the content (review, bibliography...). "In the sciences theses should be regarded not simply as objects destined for preservation and archival status, but rather as unique resources containing potentially valuable data that must be made extractable and re-usable" (Morgan et al., 2008). Accepted standard formats for deposit and dissemination are MS Office Word and Adobe PDF. But some repositories accept and process different formats, such as XML, LaTeX, Postscript or RDF, more suitable for some disciplines, semantic querying and reuse. For retro-digitized theses, this means OCR and production of text files before submission.
To increase discoverability and availability of ETDs, repositories should network and interconnect. This interconnection can be on different levels, e.g. regional or state-wide, national or international. The Networked Digital Library of Theses and Dissertations is the best-known example for an international network. The UK EThOS hosted by the British Library is a national network, like the French STAR system. These and other initiatives exhibit the main characteristics and conditions for networking and interconnection: a shared standard format for deposit, exchange and harvesting (OAI-PMH), an explicit policy, i.e. commitment to the network and collective will for sharing and interconnection, a state-wide (regional, national...) architecture, and a single hub for research published in PhD theses.
Describing a US project, Dowling & Steans (2011) point out that the principal objective is to "demonstrate research efforts of the state (through a) central finding tool for research". Another approach to increase and guarantee interoperability of local repositories is the certification of the local system and/or the networked architecture (Dobratz & Scholze, 2006).
Certainly, any institutional repository offers some basic functionality, e.g. basic and advanced search facilities, different browsing options, visualisation, downloading, etc. Yet, if the hosting institution or service provider wants to add value, if they want to increase usage and the impact of their production and content, they should develop innovative complementary and peripheral services and practices, what Halbert (2007) calls a "user-centred process of service development prioritization". In her study on open repositories, Bester (2010) distinguishes nine browsing options, nine search options and ten options for customisation and reference management, such as export of bibliographic listings in different formats, including XML. Our own survey reveals some other interesting initiatives, listed below with some examples:
- Social media tools: interactivity, discussion forum and comments (Cocciolo, 2010; Millard et al., 2010; Waddington et al., 2012)
- Federated search and more sophisticated discovery tools (Dowling & Steans, 2011).
- Usage statistics and metrics, citations (Huaroto, 2008; Walker, 2011).
- Video with presentation of thesis (Huaroto, 2008).
- Print on demand in book format (Rajendiran et al., 2005).
- Options for copyright protection or Creative Commons licensing (Hagen, 2007; Harper, 2011).
- Preservation in multiple copies (Mikeal et al., 2009).
Some of these developments are possible even with a "very modest programming staff available to deploy on ad hoc projects" (Halbert, 2007) while others ask for more investment and resources. On a more general level, the IR services for the dissemination of ETDs should be flexible, with a capacity for rapid adaptation; the software should be user-friendly and reliable, perhaps also open to other service providers and/or integrated in another service environment, such as extended (distance) learning.
Discussion: Two ways to do better
So far we have described five features that add value to the dissemination of electronic theses via institutional repositories. All of these features have already been tested and implemented but are not generalized and depend on the political, commercial and technical choices of the hosting institution and the service provider; on their commitment to open access; and on their willingness and capacity to invest. Beyond these features, we distinguish two other challenges for ETDs in institutional repositories: being future-oriented and anticipating the transformation of scientific communication to come. These are global challenges for the whole academic publishing landscape, not only for ETDs, but we can already find some studies and examples in our field.
We stated previously that the text format should allow for extraction and reuse of content. More generally, PhD theses are regularly deposited accompanied by supplemental material, e.g. video and multimedia, images, spreadsheets, music, etc. In the digital environment of open repositories and added value services, this material becomes a rich resource of research results and datasets.
"It may be possible to enhance both the research options and the richness of access. 'Deep access' to the research could include data, visual and aural representations, links to supporting material such as notebooks or collaborative activity" (White, 2007).
The impact of the dissemination of this material on institutional repositories is at least threefold:
- The IR must support other and non-textual formats, such as AV for video or mp3 for music files.
- The IR host must develop an innovative service environment, extending the basic and added value services for text files to datasets.
- The institution needs to review the legal condition of the deposit and dissemination of datasets and other material. Some content may be protected by privacy or confidentiality concerns. On the other hand, the institution and/or author must apply an open licence (CC-BY or similar) that allows the maximum reuse and exploitation of these files.
Applying the logics of e-science, supplemental material should not only be available as an appendix to, or illustration in, the related PhD thesis but also extractable and reusable when not linked to the thesis, as an independent dataset and interconnected to other data (Morgan et al., 2008; Ross, 2008; Ubogu & Sayed, 2008).
Current Research Information Systems (CRIS)
Last but not least, electronic theses and dissertations in institutional repositories should and will contribute to the evaluation of scientific production in the emerging environment of current research information systems (CRIS). "A well managed repository can simplify the task for the researchers and their respective departments, their university administrators and (...) their assessors" (Hey & Hey, 2006). Publications are essential elements of a CRIS. Institutional repositories contain the metadata for their institutions' outputs and can make it available to the CRIS for monitoring and evaluation of scientific production and research trends (Lambert et al., 2005).
For the connection of the IR data silo to the CRIS and the data exchange, standard formats (such as the Common European Research Information Format (CERIF)) and rich and valid metadata are crucial. Also, the assignment of unique identifiers for authors (PersID, DAI...), structures (OrgID) and publications (RefID, DOI...) is necessary, especially when local repositories are integrated in a national infrastructure. In such an eScience and evaluation environment, PhD theses are considered part of relevant material (Hey et al., 2006, see also Sugita & Murakami, 2007) and would become significant sources of scientometric studies and research evaluations, for scientists and research managers.
When compared to articles published in academic journals, PhD theses are sometimes considered to be second-level information, for at least two reasons. PhD theses do not undergo peer review, and they are produced early in an academic career. Yet, not only do these documents often produce results that reflect years of intensive research combined with a rich literature review, they also represent a growing part of available open content in institutional repositories (IR) where they contribute to the overall impact and ranking of their institution.
We showed five ways in which institutions can add value to the deposit and dissemination of electronic theses and dissertations via their open repositories. We also described two developments that are challenging institutional repositories. Institutional repositories can have very different characteristics, and any one solution may not apply to all institutions. Our research findings do not promote a must-do list, but instead proffer some perspectives and options that should be adapted to the specific context of a given institution.
It is crucial for the success of a repository that the institution clearly defines its objectives in line with its scientific strategy and environment. "Each of the reasons for setting up a repository carries implications for the content, design and funding of a repository, and the institution needs to be clear about the implications of different roles for a repository, while being prepared to change or add roles as the scholarly communication environment develops" (Friend, 2011).
We began our paper with the question: How can an institution add value to ETDs in its IR? Our answer is that there are at least five different ways, but that an institution, beyond its commitment to the open access principle, must make informed and conscious choices, on the technical level but also, and above all, on the political level.
1 This paper is an augmented version of a poster presented at the Fourteenth International Conference on Grey Literature: Tracking Innovation through Grey Literature, 29-30 November 2012, Rome, Italy.
2 The entire sample is available on CiteULike.
 E. Bester (2010). 'Les services pour les archives ouvertes: de la référence à l'expertise'. Documentaliste Sciences de l'Information 47(4): 415.
 L. Carr, et al. (2008). 'Institutional Repository Checklist for Serving Institutional Management'. In Third International Conference on Open Repositories 2008, April 1-4, 2008, Southampton, United Kingdom.
 A. Cocciolo (2010). 'Can Web 2.0 Enhance Community Participation in an Institutional Repository? The Case of PocketKnowledge at Teachers College, Columbia University'. The Journal of Academic Librarianship 36(4):304-312. http://dx.doi.org/10.1016/j.acalib.2010.05.004
 S. Dobratz & F. Scholze (2006). 'DINI institutional repository certification and beyond'. Library Hi Tech 24(4): 583594.
 T. Dowling & R. Steans (2011). 'Managing Consortial ETD Repositories'. In USETDA 2011: The Magic of ETDs... Where Creative Minds Meet, May 18-20, Orlando, Florida.
 . Ducloy, et al. (2006). 'Metadata towards an e-research cyberinfrastructure: the case of French PhD theses'. In DCMI '06: Proceedings of the 2006 international conference on Dublin Core and Metadata Applications, pp. 133148. Dublin Core Metadata Initiative.
 F. Friend (2011). 'Open Access Business Models for Research Funders and Universities'. Briefing Paper, Knowledge Exchange, Copenhagen.
 S. Gould & H. Rosie (2012). 'EThOS: the UK national E-Theses Online Service'. In USETDA 2012: A revolution in scholarship A Commonwealth of Knowledge, June 13-15, Quincy, Massachusetts.
 J. H. Hagen (2007). 'Building Effective Discovery Tools for Academic Promotion and Tenure Evidence: The Added Value of ETD and Institutional Repository Metadata, Citations and Access'. In ETD 2007 10th International Symposium on Electronic Theses and Dissertations, June 13-16, 2007, Uppsala, Sweden.
 M. Halbert (2007). 'Integrating ETD Services into Campus Institutional Repository Infrastructures Using Fedora'. In ETD 2007 10th International Symposium on Electronic Theses and Dissertations, June 13-16, 2007, Uppsala, Sweden.
 S. Harnad, et al. (2008). 'The Access/Impact Problem and the Green and Gold Roads to Open Access: An Update'. Serials Review 34(1): 3640.
 G. Harper (2011). 'ETDs, Open Access and Intellectual Property Issues'. In USETDA 2011: The Magic of ETDs... Where Creative Minds Meet, May 18-20, Orlando, Florida.
 T. Hey & J. Hey (2006). 'e-Science and its implications for the library community'. Library Hi Tech 24(4): 515528.
 J. M. N. Hey, et al. (2006). 'Leveraging the Institutional Research Repository: harnessing the drive for quality assessment'. In A. Asserson & E. J. Simons (eds.), Enabling Interaction and Quality: Beyond the Hanseatic League 8th International Conference on Current Research Information Systems, pp. 8998, Leuven. Leuven University Press.
 W. Heyse (2007). 'Supporting change and diversity for theses in an institutional repository'. In ETD 2007 10th International Symposium on Electronic Theses and Dissertations, June 13-16, 2007, Uppsala, Sweden.
 L. Huaroto (2008). 'Cybertesis as a cooperative process for the implementation of the Digital Thesis Peruvian Network'. In ETD 2008 11th International Symposium on Electronic Theses and Dissertations, June 4-7, 2008, The Robert Gordon University, Aberdeen, UK.
 P. Jain (2011). 'New trends and future applications/directions of institutional repositories in academic institutions'. Library Review 60(2): 125-141. http://dx.doi.org/10.1108/00242531111113078
 S. Lambert, et al. (2005). 'Grey literature, institutional repositories and the organisational context'. In Seventh International Conference on Grey Literature: Open Access to Grey Resources, Nancy, December 5-6, 2005.
 C. A. Lynch (2003). 'Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age'. Tech. Rep. 226, ARL Association of Research Libraries.
 A. Macha & K. de Jager (2011). 'A comparative overview of the development of the institutional repositories at the University of Cape Town and at the University of Pretoria'. In ETD 2011 14th International Symposium on Electronic Theses and Dissertations, September 13-17, 2011, Cape Town, South Africa.
 C. Marsh & A. McLean (2008). 'Ensuring Discovery of ETDs: The Hong Kong University of Science & Technology & ProQuest/UMI Case Study'. In ETD 2008 11th International Symposium on Electronic Theses and Dissertations, June 4-7, 2008, The Robert Gordon University, Aberdeen, UK.
 A. Mikeal, et al. (2009). 'ETD Management in DSpace: A Report from the Texas ETD Repository Project'. In ETD 2009 12th International Symposium on Electronic Theses and Dissertations, June 10-13, 2009, Pittsburgh, Pennsylvania.
 D. E. Millard, et al. (2010). 'MePrints: Building User Centred Repositories'. In 5th International Conference on Open Repositories, Madrid, Spain, July 6-9, 2010.
 P. Morgan, et al. (2008). 'Extracting and re-using research data from chemistry e-theses: the SPECTRa-T Project'. In ETD 2008 11th International Symposium on Electronic Theses and Dissertations, June 4-7, 2008, The Robert Gordon University, Aberdeen, UK.
 M. Moyle (2009). 'The DART-Europe E-theses Portal: helping the discovery of Europe's open access research theses'. In ETD 2009 12th International Symposium on Electronic Theses and Dissertations, June 10-13, 2009, Pittsburgh, Pennsylvania.
 E. G. Park & M. Richard (2011). 'Metadata assessment in e-theses and dissertations of Canadian institutional repositories'. The Electronic Library 29(3):394407. http://dx.doi.org/10.1108/02640471111141124
 P. Rajendiran, et al. (2005). 'http://pandora.nla.gov.au/pan/63039/20060926-0000/adt.caul.edu.au/etd2005/papers/026Rajendiran.pdf'. In ETD2005 8th International Symposium on Electronic Theses and Dissertations, September 28-30, 2005, Sidney, Australia.
 A. Ross (2008). 'The changing landscape of dissertation supplementary and supporting content: ETDs in transformation'. In ETD 2008 11th International Symposium on Electronic Theses and Dissertations, June 4-7, 2008, The Robert Gordon University, Aberdeen, UK.
 P. Schirmbacher (2009). 'From an ETD-Collection to a Visible Open Access Repository'. In ETD 2009 12th International Symposium on Electronic Theses and Dissertations, June 10-13, 2009, Pittsburgh, Pennsylvania.
 J. Schöpfel (2010). 'Towards a Prague Definition of Grey Literature'. In Twelfth International Conference on Grey Literature: Transparency in Grey Literature. Grey Tech Approaches to High Tech Issues. Prague, December 6-7, 2010.
 J. Schöpfel, et al. (2011). 'Open is not enough. A case study on grey literature in an OAI environment'. In Thirteenth International Conference on Grey Literature: The Grey Circuit. From Social Networking to Wealth Creation. Washington, DC, December 5-6, 2011.
 K. Smith (2008). 'Institutional Repositories and E-Journal Archiving: What Are We Learning?'. Journal of Electronic Publishing 11(1).
 I. Sugita & Y. Murakami (2007). 'Dissertations and Theses in Institutional Repositories: Case Study in Japan'. In ETD 2007 10th International Symposium on Electronic Theses and Dissertations, June 13-16, 2007, Uppsala, Sweden.
 A. Swan & C. Awre (2006). 'Linking UK Repositories: Technical & Organisational Models to Support User-Oriented Services Across Institutional & Other Digital Repositories'. Tech. rep., JISC, London.
 F. N. Ubogu & Y. Sayed (2008). 'Management of Research Data in ETD Systems'. In ETD 2008 11th International Symposium on Electronic Theses and Dissertations, June 4-7, 2008, The Robert Gordon University, Aberdeen, UK.
 S. Waddington, et al. (2012). 'CLIF: Moving repositories upstream in the content lifecycle'. Journal of Digital Information 13(1).
 E. P. Walker (2011). 'What We Can Learn from ETDs: Using ProQuest Dissertations & Theses as a Dataset'. In USETDA 2011: The Magic of ETDs... Where Creative Minds Meet, May 18-20, 2011, Orlando, Florida.
 W. White (2007). 'Opening access and closing risk: delivering the mandate for e-theses deposit'. In ETD 2007 10th International Symposium on Electronic Theses and Dissertations, June 13-16, 2007, Uppsala, Sweden.
About the Author
Joachim Schöpfel is lecturer of Library and Information Sciences at the University of Lille 3, Director of the French Digitization Centre for PhD theses (ANRT) and member of the GERiiCO research laboratory. He was manager of the INIST (CNRS) scientific library from 1999 to 2008. He teaches Library Marketing, Auditing, Intellectual Property and Information Science. His research interests are scientific information and communication, especially Open Access and Grey Literature.