Gerard van Westrienen
Clifford A. Lynch
On May 10-11, 2005, the Coalition for Networked Information (CNI), the UK Joint Information Systems Committee (JISC), and the SURF Foundation in the Netherlands hosted an international conference titled "Making the Strategic Case for Institutional Repositories". The purpose of this conference was to take a broad look at the current state of deployment of institutional repositories (IRs) in the academic sector, and to explore how national policies and strategies were shaping this deployment. In preparation for the meeting, the organizers solicited data on institutional repository deployment from some thirteen nations: Australia, Canada, the United States and ten European countries Belgium, France, the United Kingdom, Denmark, Norway, Sweden, Finland, Germany, Italy and the Netherlands. We used a common template (reproduced in Appendix: Questionnaire) for the national reports although many nations were unable to address all of the questions that were part of the template and presented the collected data as part of the conference program. After the conference, representatives of each nation were given an opportunity to revise their submissions in order to clarify differences of interpretation that became evident as the data from different countries were compared in Amsterdam.
The complete data from each nation can be found at <http://www.surf.nl/download/country-update2005.pdf>. Other materials from the conference can be found at <http://www.surf.nl/en/bijeenkomsten/index2.php?oid=6>.
To the best of our knowledge, this data is the first effort to gather comparative international data about institutional repository deployment in a systematic fashion. The purpose of this article is to summarize and comment upon the findings; our comments are, of course, also informed by presentations at the Amsterdam meeting and at other conferences, as well as by the literature and our knowledge of various specific deployment developments.
In hindsight, it is clear that some of the questions might beneficially have been formulated in a different, or more detailed, way in order to obtain comparable data from different nations (or indeed, in many cases, data that is comparable simply from one institution to another). It is also clear that we were ambitious in our hopes about data availability. Information available in one country was not easily obtainable in another country; many aspects of the system of higher education vary enormously nation by nation, with extensive and complex implications not only for the shape of the evolving national collection of institutional repositories but also the understanding and characterization of this collection of repositories and its relationship to scholarly communication. Ultimately, in developing the survey template, we chose to take an exploratory view, asking for a wide range of information and accepting data in whatever metrics the participating nations could supply it.
The data collection process was also problematic. It was not always clear what organization (or organizations) to ask about the picture in a given nation, and each national respondent scoped and approached the data collection and reporting a bit differently. Indeed, within some nations, there may well be disagreement between different parties on the current state of deployment, and processes for resolving these disagreements or developing consensus in each participating country around a data collection strategy and report were far beyond the scope of the conference or the data gathering process. We urge the interested reader to review the individual country reports, which often describe limitations on specific national data gathering processes or presentation. In our analysis, we have tried to respect the data reported by each nation and keep second-guessing to a minimum, although in the particular cases of the Netherlands, the United Kingdom, and the United States the three co-sponsor organizations were able to provide some additional interpretation that helped to clarify data reported and analyzed here. Undoubtedly, some of the national data reported here is incomplete and almost certainly underestimates the state of repository deployment, but there is every reason to believe that the overall trends remain valid. Finally, the reader should keep in mind that deployment seems to be moving fast in most nations, which means that some of this data will age rapidly.
Despite these limitations, we think the exercise was very valuable and resulted in interesting and thought-provoking data and information. It certainly raises questions about policies and strategies that national higher education, research funding and policy-making bodies, and individual institutional communities within the higher education sector will want to consider. It also frames questions about the possible value of developing some guidelines or standards related to data gathering and reporting, in order to be able to monitor the growth and development of IRs.
2. The nature of Institutional Repositories
We witnessed a great diversity in IRs. We looked at the total number of IRs; the percentage of universities in a country with an IR; the number of objects and type of objects in an IR; the disciplinary coverage of IRs; and the IR software used. We were also interested in finding information about the academics that have become involved with IRs.
Number of IRs
n.r.: not reported
Further, the deployment percentage estimate is derived by dividing the number of repositories by the total number of reported institutions. This is only an estimate: it is clear from the data that some institutions have more than one IR. Some academic institutions consist of different colleges and schools, each with its own repository, or they have set up repositories more closely related to the departments within those institutions. Data from some nations report these repositories as part of the institutional repository count while other nations have tried to count only repositories that are supported on an institution-wide basis. Some institutions also have established IRs especially for certain types of material, like dissertations, or working papers, or video materials, and a single institution may thus have several different repositories for different types of materials; this is another way in which a single institution might report more than one institutional repository.
The estimate shows a spread from around 5% in a country like Finland, where repositories are just getting started, to essentially 100% deployment in countries like Germany, Norway and the Netherlands, where it is clear that repositories have already achieved some status as common infrastructure across the relevant national higher education sector and, hence, can form the basis of other initiatives that presuppose the near-universal availability of institutional repositories.
Number of objects
The average number of records per IR in the countries we have looked at, to the extent that we could compute it, seems to be typically a few hundred, with the exception of the Netherlands: here we see an average of 12,500 records per IR. Note that we did not have the data to do the computation for some nations such as the USA.
It also became clear in analyzing the data that the count of "records" meant very different things to different nations. In some countries, such as the USA, there is a very strong assumption that the contents of institutional repositories are full source objects such as papers, images or datasets; in other nations, such as the Netherlands, some records in institutional repositories were only metadata (essentially bibliographic entries). For the Netherlands we were able to obtain some additional data to clarify the interpretation. From the average number of 12,500 records in an IR in the Netherlands, around 3,000 have also the full object file available. It is clear that if we do additional surveys in future, we will need to be much more specific about this metadata/full object dichotomy in our requests for data, but it is also important to recognize this is only part of the problem of how to count objects in repositories: as objects become complex, with versions, or hierarchical structures, or as composite data streams, many legitimate different interpretations are possible.
Another observation came from the USA, where the suggestion was made to estimate the size and growth of IRs, not only in number of records but (also) in the amount of gigabytes or even terabytes, and where they collected data of this type. While counting disk space is also problematic, it at least provides another perspective to complement the count of objects or records.
Type of objects
The findings are shown in Table 2.
What comes across clearly from this data is that, in the countries reporting, the main focus of the holdings of current IRs is on textual material. However, within this type of material we witness strong differences per country, e.g. in Norway 90% of the current records are for books and theses, while in France it is estimated that 80% of the current records are for articles. It is also worth noting that the "other" category for Germany (25%) is textual proceedings as is the 40% "other" for the Netherlands, which is mainly research reports.
In contrast to the other nations reporting, in Australia 83% of the records are primary data, though the Australia country report cautions that the interpretation of percentages here is somewhat arbitrary (for the reasons we've discussed earlier related to measuring repository size). It is also clear from the data provided by the United States, which included counts of the number of repositories storing a wide range of materials types, that US repositories hold a significant amount of non-textual content.
This data is very important, as it suggests critical insight into the ways in which various nations are thinking about the role of institutional repositories, and our findings suggest that, with the possible exceptions of Australia and the United States, currently the institutional repositories mostly house traditional (print-oriented) scholarly publications and grey literature: journal articles, books, theses and dissertations, and research reports. From this we can at least speculate that, again outside of Australia and the United States, open access issues in scholarly publishing may well be the key drivers of institutional repository deployment, at least in the very short term, rather than the new demands of scholarly communications related to e-science and e-research. Of course, this may well shift over time, and we may also be seeing some skewing in the reported data based on strategies being used to expedite the initial population of institutional repositories.
A full understanding of this also requires an appreciation of the entire national context for managing scholarly and scientific information. For example, the UK has an extensive, sophisticated, and well-developed system of national repositories for data in various areas, so scholars producing digital datasets would not need to rely on a local institutional repository infrastructure to store data; rather, they would deposit the data in the appropriate national repository, and then perhaps deposit publications in the local institutional repository that linked to the datasets stored in the national repository. By contrast, the United States has very limited national-level data repositories.
For some of the countries that were able to indicate the disciplinary coverage, we see strong differences: Australia and Italy apparently have a focus on the Humanities and Social Sciences while in a country like the UK, almost two thirds of the focus is currently on Natural Science and Engineering. In other countries, like Sweden and the Netherlands, the distribution among disciplines is more evenly spread. (Interestingly, the definitions of the broad disciplinary categories were clearly subject to some interpretation: the data from France (reflected in Table 3) indicates that the 67% "other" is primarily physics and mathematics; in Germany, the 25% "other" is evidently mainly computer science.)
+: one or more of the institutions surveyed currently use this software package
Few countries could answer the question, but the indication is clear that the number, as well as the percentage of total academics, is still very low, with two exceptions: in the Netherlands it is estimated that at least one record from around 40% of all academics is deposited in an IR at this moment. We also witness an exception for a specific type of material: dissertations. In Germany, for example, it is estimated that, depending upon the discipline, between 2% and 62% of all academic dissertations have been deposited to the IR. These are very interesting figures.
We also asked for estimates of the national coverage of yearly research output by broad discipline that was going into the national collection of IRs. Again, while only a few nations attempted to provide such estimates, they correlate reasonably well with the observations about extent of participation from the academic community. The Netherlands estimates about 25% of the national research output, across a wide range of disciplines, is now going into its institutional repositories. The French institutions in Belgium also give large estimates 33% in Humanities and Social Sciences, 39% in Life Sciences, 16% in Natural Sciences, and 11% in Engineering. With the exception of a 10% estimate for Natural Sciences in Germany and a 15% estimate for Engineering and Computer Science in the UK, the other estimates (when supplied) were negligible.
From the various remarks on the question regarding the delivery process, we may conclude that in most cases intermediaries, like librarians, are doing the depositing, although in Italy some universities declare that most archiving is done by academics themselves.
Other remarks indicated that (in some cases, like in the UK) a large part of the current items in IRs were inherited from previous systems, which indicates that many institutions are still in the process of setting up an IR and that the filling of the IR is not yet incorporated in the standing workflow and processes of the institutions. We can see similar patterns at least anecdotally in some of the strategies US institutions are employing to rapidly build up content in their IRs.
3. Federated and cross-repository services
We asked about the kinds of services that were being created on top of the institutional repositories, either on a national or multi-national basis. It is clear that while many experiments are being launched (most commonly federated searching, print-on-demand, or automatic replication of metadata and/or content from one system to another), these services are still highly experimental, and it is too early to tell which ones will become successful and widely adopted. It seems clear, however, that various forms of cross-repository searching (either through federated search or harvested union catalogs) will be needed, and there are many such services, at least in prototype. But the first and main focus at the moment is on setting up the infrastructure and getting the content into the institutional repositories. Further, there is the question of which organizations should build (or at least fund and promote the building of) cross-institutional services; this seems less problematic in nations with strong central planning and funding organizations, such as JISC or SURF, than in other nations without such organizations.
Support for standards related to harvesting, notably the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) seems widespread; support for various search interfaces such as SRU/SRW/Z39.50 is less universal but still seems fairly common. This suggests that as the time comes to develop cross-institutional services, the infrastructure of repositories will already have in place at least some of what are expected to be the key standards.
Specifics of many of the prototype projects (including links) are available in the country reports on the SURF site <http://www.surf.nl/download/country-update2005.pdf>.
There are a few specific areas of standards and best practices where a number of contributors noted problems and suggested that coordinated work might be beneficial, most notably related to persistent identifiers for objects stored in repositories.
4. National and institutional policies and organizations
We asked each country about the existence of national policies related to institutional repositories. While there are only a few actual national (governmental) policies, there are also a growing set of reports, declarations, policy directives and similar activities at a national level or across major groups of higher education institutions within a nation that increasingly lend support to institutional repositories (often in the context of infrastructure to advance open access specifically). Note that the relationships between higher education and government, and thus the locus of policy-making in these areas, is complex, sometimes subtle, and highly variable from country to country. The list, nation by nation, of specific relevant developments that can be found in the country reports is probably a better picture of the current status than any simple set of counts.
In some countries, like the UK and the Netherlands, there exist large national programmes that are advancing the deployment of IRs as well as standards and best practices surrounding their implementation, but we also witness elsewhere the existence of coordinating and stimulating bodies, workgroups or consortia at levels ranging from national to fairly small multi-institutional collaborations; the amount of central national leadership (and funding) driving institutional repositories seems to vary greatly. Details or links are presented in the questionnaire responses.
A special and interesting situation exists in Germany, where a national body was set up to certify IRs according to certain standards. In other countries, like the Netherlands, national coordinating or funding bodies also try to reach national agreements on standardization.It is clear that we are starting to see many developments of policy statements regarding scholarly communication and access to knowledge at the institutional level, as well as collective action through adoption of the Berlin Declaration and similar documents. While many of these are general in nature, several countries provided specific pointers to institutions that had policies related to deposit of materials in institutional repositories.
5. Stimulating and inhibiting factors in IR deployment
When we asked country-respondents to list the main inhibitors or bottlenecks for establishing, filling and maintaining IRs, we got a long list of answers. Many of them were not unexpected, and revolved around resource constraints and the difficulties of informing faculty about the value of institutional repositories and convincing faculty to contribute. It is clear that there is confusion, uncertainty and fear about intellectual property issues (not just getting copyright permissions to deposit, but questions about who will use material that has been deposited, how it will be used, and whether it will be appropriately attributed), about impact factors and scholarly credit, and related matters. Oddly, there seems to be a persistent myth circulating that material in institutional repositories is of low quality; some of this may be connected to attempts to portray open-access material as being of low quality, since as we have seen the near-term driver in most countries for institutional repositories appears to be open access.
It is very clear that cumbersome and time-consuming submission procedures are a major barrier, and that every effort needs to be made to minimize the amount of work faculty must do to submit their work to the institutional repository, and to maximize the benefits (for example, through the automatic linked maintenance of bibliographies or dissemination of pre-prints or offprints).
A last point frequently cited is the lack of mandatory provisions in the policies of institutions or funding organizations to deposit the outcome of academic research into repositories such as IRs, though the establishment of such policies, particularly at an institutional level, continues to have controversial elements as well.
The stimulating factors mainly correlated with the inhibitors. Some nations for example Australia and Belgium stressed transparent and simple submission processes. Others mentioned the strong involvement and support of libraries in the submission process. Smart propagation of materials from the institutional repository to national or disciplinary repositories without the need for additional faculty intervention is another way to add value (for example, in Denmark, universities need to document scientific publications in the Danish National Research Database; work is underway to propagate to this from the local institutional repositories automatically.)
Many of the other stimulating factors involved ways to address faculty concerns and to provide faculty with compelling value from their submission. Getting repository contents indexed in Google and similar search services makes faculty work more visible and accessible. Arguments that open access content is more heavily read and cited also help build the case for depositing work into IRs. A project like "Cream of Science" <http://www.creamofscience.org> in the Netherlands a systematic effort to populate the national repository system with as much as possible of the publications of several hundred of the nation's leading scholars lends prestige and legitimacy to institutional repositories. In several countries, programmes and policies around electronic theses and dissertations have stimulated the establishment and population of IRs.
In Belgium a stimulating factor is the service for authors to set up and maintain their list of publications, and in the Netherlands a stimulating factor for authors is the link to the E-Depot of the Royal Library for the preservation of objects (one of the very few mentions of preservation).
A last point we want to make here is the clear trend, in many nations, towards greater accountability and evaluation of research (like the Research Assessment Exercise in the UK) and the competition for funding. In Belgium lobbying is going on to make the IR the only source for academic authorities to decide how to allocate funding to the different centers and departments within the university. To the extent that IRs are directly linked to research funding and research evaluation (at the individual or institutional level) faculty have a very compelling reason to deposit material into them.
What was striking, particularly when contrasted to some of the US anecdotal experience, was that non-US respondents wholly identified relationships between (rather traditional) faculty scholarly publishing and the repository as the locus for barriers and stimulating factors in the repository's establishment. We did not hear issues raised about the need to manage, preserve and provide access to large, complex, inherently digital objects such as datasets, software, simulations and the like that constituted fundamentally new forms of scholarly communication not accommodated by the existing scholarly publishing system. We did not hear about the impact of e-science and e-research on scholarly communication. We heard little about the need to manage digital learning objects, or digital materials created by students as part of their academic work. We did not hear about the need to manage institutional records in digital form or the need for repositories to support activities to capture, organize and provide access to records of the intellectual, cultural and artistic life of academic institutions. As discussed earlier, we have to be very cautious in interpreting this: in some nations, the materials may be handled through other means such as national-level repositories. And the way in which the various national data collection efforts were formulated and conducted may have focused attention away from some of these issues (for example, learning materials were specifically excluded from the UK survey). Clearly, if the survey is repeated, this is an area that will call for some additional thought and analysis.
This survey represents what we believe is the first broad-based international inquiry into a still very immature and rapidly evolving part of the infrastructure for scholarship and scholarly communication. For all of its imperfections and shortcomings (many of which could only be fully recognized and understood in hindsight) we believe that the survey provides some very significant insights into the state of the international repository infrastructure.
It is clear, at least among the nations surveyed, that institutional repositories are becoming well established as campus infrastructure components. They are broadly deployed in many of the countries surveyed, and essentially universally available in a few already. We have every reason to believe that deployment rates will continue to increase; since the data was gathered, we know of developments in at least the US and the UK that are likely to significantly increase the number of deployed institutional repositories over the next year or two. It seems very likely that over the next few years this infrastructure will become extensive enough, and strong enough, to support a growing layer of both services and policies that assume the infrastructure's widespread presence.
The acquisition of content is still the central issue for most institutional repositories. Except perhaps for the United States and Australia, the focus seems to be almost exclusively on faculty publications, and we are seeing some very innovative strategies such as "Cream of Science" in the Netherlands to increase faculty participation and acceptance. The growing emphasis on research evaluation in some nations may also accelerate the deposit of faculty publications, as will the adoption of open access policies on a national, institutional or funding agency basis.
It will be very important to gain a better ongoing understanding of the extent to which institutional repositories are necessary to support developments related to e-science and e-research, or indeed for a wide variety of other purposes beyond managing and providing access to relatively traditional faculty publications, and how actively they are being used for these purposes. It is clear that this is happening in the United States, at least at some institutions, and in Australia; we know of active work at various institutions in the United Kingdom, and these are objectives being explicitly advanced nation-wide in the JISC repositories development programme. But the evidence from our country surveys does not suggest that this is happening to a great extent in other nations that contributed data. As discussed earlier, this needs to be understood in the context of overall national (and, where relevant, international) strategies for the management of scholarly materials in the broadest sense: we do not have a good nation-by-nation picture of this, and we need one to complement our comparative understanding of the various national institutional repository deployments. An additional way to cross-validate our understandings here would be to directly examine the information management practices of leading e-research efforts in various nations to find out what they actually are doing with the various kinds of content that their work is producing.
Finally, it is worth noting that our questionnaire tried to gather information on some of the interplay between published national research output and institutional repositories, though with limited success. Those concerned with understanding the developments in open access are trying to collect closely related information. As more and more attention is focused on research evaluation and management, these broader issues of characterizing national research output will take on additional importance. Of particular relevance here, the potential use of appropriately structured IR measurements as tools to help understand these broader issues deserves careful examination, as do ways in which we might better understand IR developments through efforts to characterize the broader landscape.
A number of the participating nations urged that the survey be repeated, or perhaps even done on a regular annual (or more frequent) basis. The value of the survey would be enhanced if we could make collective progress on approaches to measuring repository size and rate of growth that would facilitate inter-institutional and inter-national comparisons, and also the development of useful trend data across time. Clearly, it would also be useful to extend the survey to cover developments in additional countries. Future work in these areas is under active discussion by CNI, JISC and SURF at the present time.
We thank all those who collected and contributed data to support this study, as well as our colleagues at JISC, SURF and CNI who worked with us on the project. A special thanks goes to Neil Jacobs of JISC for a careful reading and some very important comments on an earlier draft of this article.
(On 9/16/05, a correction was made to Table 4, to move from the iTOR column to the DIVA column the number and kind of software packages (10).)
Copyright © 2005 Gerard van Westrienen and Clifford A. Lynch