In the spring of 2005, the Coalition for Networked Information (CNI) surveyed its academic member institutions to examine the current state of institutional repositories (IRs) in the US. This effort was coordinated with the work CNI performed in compiling information for the United States country report in preparation for an international conference titled "Making the Strategic Case for Institutional Repositories," held in Amsterdam on May 10-11, 2005. This conference was jointly sponsored by CNI, the Joint Information Systems Committee (JISC) in the United Kingdom and the SURF foundation in the Netherlands. Thirteen nations participated by providing data on the state of repository deployment in their countries. The CNI survey of US institutions was informed by the information requested as part of a common report template developed for the Amsterdam meeting, but it also included questions devised to gather additional information geared to further illuminate the state of institutional repository development in the US. The information presented in this article complements the companion article by van Westrienen and Lynch, in this issue of D-Lib Magazine, that examines the results across all 13 countries. The full data submitted by each nation can be found at <http://www.surf.nl/download/country-update2005.pdf> Other materials from the conference can be found at <http://www.surf.nl/en/bijeenkomsten/index2.php?oid=6>.
To the best of our knowledge, there has been relatively little systematic examination of the actual state of deployment of institutional repositories in higher education (or even among research universities) across the United States, although there has been considerable attention focused on specific leading-edge deployments such as the DSpace implementations, first at the Massachusetts Institute of Technology and subsequently at other institutions. Along with these institution-specific case studies, there has been a good deal of speculation about the viability and importance of institutional repositories and particularly about the imputed relationships of institutional repositories to the open access movement (and thus the extent to which the success or failure of institutional repositories provides evidence of either the success or failure of the open access movement). There has also been extensive investigation of the role of various types of repositories in the scholarly communications process, particularly in the context of e-prints and author self-archiving, and even, more recently, with respect to institutional policies about author self-archiving; however, these studies really don't illuminate the full range of developments surrounding institutional repository planning and deployment.
The Coalition for Networked Information <http://www.cni.org>, a joint program of the Association of Research Libraries (ARL) and EDUCAUSE, has been highlighting, analyzing, and promoting awareness of the development and roles of institutional repositories in academic institutions in recent years through publications and conferences . CNI has viewed repositories very broadly both in the context of its overall programmatic interests in the management of digital knowledge assets and in the evolution of scholarly communication in the digital world.
Survey of US Higher Education Institutions CNI Members
The Coalition for Networked Information is an institutional membership organization dedicated to advancing the transformative promise of networked information technology for the advancement of scholarly communication and the enrichment of intellectual productivity. Our approximately two hundred member institutions come from a variety of sectors, including higher education, libraries, information technology, government, foundations, networking, and non-profits. Most of our member institutions are in the US, but we also have significant representation from Canada, as well as members from the UK, Europe, and Australia.
In February 2005, CNI sent a survey on the topic of institutional repositories to the 124 individual higher education institutions in the US in our membership. In addition, we sent the survey to a group of 81 liberal arts colleges that have a consortial membership in CNI. The survey was deliberately kept brief (11 questions) and was in e-mail format. After one follow-up message, we had responses from 97 of the 124 individual higher education institutions surveyed (78.2%). All of these 97 institutions are in the "doctoral universities" categories of the Carnegie classification. In addition, we had 35 responses from the 81 liberal arts institutions surveyed (43.8%).
There was no attempt to survey a statistical sample of the full universe of US higher education institutions. According to US Department of Education statistics, there are some 2,364 postsecondary degree-granting, four-year institutions in the US (Pocket Guide, 2005). Of these, the Carnegie classification <http://www.carnegiefoundation.org/Classification/> lists 261 institutions as "doctoral research universities." Our 97 responding institutions over-represent US doctoral research universities with commitments to a strong research program.
Our impression based, admittedly, on informed anecdote rather than systematically collected survey data is that deployment of institutional repositories beyond the doctoral research institutions in the United States is extremely limited; as far as we know, most of the engagement with institutional repository planning and deployment beyond the doctoral research level group is at colleges and universities where students and faculty have strong commitments to locally created materials for teaching and learning or that document student research. Our hope was that the data we gathered from the liberal arts colleges would at least provide some insight into current developments in this sector, though we had no illusions that a sample from a set of some 80 institutions that had self-selected through commitments to exploring the use of digital content and advanced information technology within their institutions would yield any statistically meaningful information about four-year colleges in general. It is worth noting that while we believe that current deployment and deployment planning for institutional repositories is concentrated in the institutional groupings we have just described, there are several trends that may over time broaden the base. These include the growing adoption of electronic student portfolios and of electronic theses and dissertations, and perhaps faculty demand at non-research institutions for access to institutional repository services (likely provided in some outsourced or consortial framework) in support of the dissemination of their own research.
As we prepared the survey, we debated whether to provide a working definition of "institutional repository" in the instructions for completing the survey. There are two views of institutional repositories that differ somewhat in emphasis (though they are not ultimately inconsistent, as one is a subset of the other). One characterizes an institutional repository as primarily addressing dissemination of various forms of e-prints for faculty work; sometimes but not always this is explicitly linked to objectives about providing open access to faculty publications. (This also seems to be the typical view of the purpose of a repository at the sub-institutional level, e.g. a department or school repository.) The second approach conceives of an institutional repository as broadly housing the documentation of the intellectual work both research and teaching of the institution, records of its intellectual and cultural life, and supporting evidence for present and future scholarship. Such an institutional repository will include e-prints, certainly, but also datasets, video, learning objects, software, and other materials. This is the vision of institutional repositories that was addressed by Lynch as he offered this definition in his 2003 article:
"a university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution." (Lynch, 2003)
Our concern in debating whether to include a working definition was that we clearly communicate that we wanted to capture information based on the broadest possible view of institutional repository initiatives. However, for the purposes of the survey, we ultimately chose not to provide a definition and instead requested that the respondents complete the survey with their own view of institutional repository as the parameter in the hopes that this would provide more insight into how institutions were thinking about the roles, purposes and scope of institutional repositories. (As we will discuss later, this strategy worked perhaps too well, raising issues about how to differentiate institutional repository efforts from other institutional efforts on digital libraries, collections, and archives.)
It is important to recognize that our survey collected data specifically on institutional implementation and planning work, typically based on information gathered from senior library executives (possibly in consultation with the CIO and other campus executives); because of the organizational structure of CNI we were particularly well positioned to reach out to the appropriate leadership within our member institutions and to be able to get a very high response rate. There is clearly a great deal happening at our member institutions below the institutional level; individual departments are setting up e-print repositories, for example, as are school-level groups (particularly in professional schools such as business or engineering). We made no effort to collect information on these activities. Spot comparisons between data that we collected and the data reported on the deployment of the Southampton EPrints software <http://archives.eprints.org/> underscores this situation. Indeed, there is good reason to wonder about the level of coordination or even awareness in some cases between departmental or school level activity on one hand and truly institutional (central) activity on the other, and certainly to ask questions about strategies to coordinate these diverse strands of activity. This is an area that would clearly benefit from focused future exploration.
Extent of implementation of repositories
While there are thousands of degree-granting institutions in the US, there are only about 250 research universities. Based on our survey of roughly half of these US doctoral-granting institutions (with a response rate of around 80%) it seems very clear that institutional repositories are becoming well-established as campus infrastructure components around 40% of the respondents have some type of institutional repository operating, and 88% of those that do not yet have a repository have planning work underway for an institutional repository or participation in some form of consortial repository system. This group of universities produces the majority of the research in the US. As expected, we are also seeing some scattered deployment of institutional repositories among four-year colleges and other non-research-intensive higher education institutions, though this is not widespread and we suspect in many cases it is intended to support the management of teaching materials, materials to support research that is closely tied to teaching, or student and faculty materials that come out of course projects. Of the 81 liberal arts institutions that were sent our survey, we had responses from 35 (43%), and only 2 of those (6% of respondents) reported operational institutional repositories.
It is also clear that among these non-research-intensive institutions there is real concern about the fixed cost of operating a repository at the institutional level and about levels of use at which economies of scale can be realized, and there seems to be a great deal of interest in either purchasing repository services from some other entity or becoming part of consortial, multi-institutional repositories as a way of sharing these fixed operating costs. Of the universities with no repository at present, 28% plan to use a consortial repository as their implementation strategy in the future or use both institutional and consortial repositories; 21% of the liberal arts institutions plan to use a consortial repository as their implementation strategy or a component of their implementation of repositories. Many of the policy questions, particularly about the operation of consortial repositories, remain to be fully explored. It seems natural that these consortial repositories will tend to be deployed more slowly than the leading wave of single-institutional repositories, since the planning is more complicated and the experience of the leadership in single institutions will help to frame the functional requirements for the consortia-based services. However, once implemented, one consortial repository deployment simultaneously delivers an IR to many institutions, and thus, in future we can believe we will see a multiplier effect from these consortial deployments, especially beyond the research universities. It is also important to recognize how policy choices made at the consortial level (for example, about what sorts of content objects a repository service will support) will also have multiplier effects in future. Because of this, we believe a follow-on focused survey of emerging developments in consortium-based repository services would be very timely.
Size of repositories
The survey asked respondents to indicate the size of their institutional repositories using any type of measure that they track, e.g. number of objects or space occupied. At the time we developed the survey instrument, it was already clear from discussion with managers of institutional repositories that no standard way of counting the content exists at present. The results that we obtained on the size of the current holdings in various institutional repositories were very problematic; it is clear that no two institutions are counting the same things. We received reports of the number of objects ranging from hundreds of thousands to, at the low end, a few dozen. The diversity in both the definition of what constitutes an "object" and in the nature of the objects being stored (massive videos or groups of datasets as opposed to individual articles or images) makes repository size very hard to interpret, or to relate to space measurements. Some institutions described repository size by estimating the amount of mass storage dedicated to their repositories, with reports running from over ten terabytes on the high end to less than a gigabyte on the low end. Again, it's clear that using this data to make comparisons would require exquisitely careful definition, which we did not attempt: it would be necessary to consistently account for mirroring, RAID overhead, object replication and similar factors in the definition of storage space for a repository.
While comparing the size of repositories between institutions is clearly a very complex problem, probably intractable in the short term, it would be relatively easy to collect estimated rate of repository growth data from institutions, and this would be helpful in understanding the landscape. We should have done this in the survey but didn't think of it, and if we repeat the survey, we certainly will ask about it. We have some anecdotal evidence that at least a few newly established repositories are growing at rates measured in terabytes per year or even per month.
Types of materials in repositories
In order to develop a list of types of content currently in repositories or planned for inclusion in repositories in the near future (e.g. 1-3 year time horizon) for our US survey, we started with the categories from the national template prepared for the CNI/SURF/JISC meeting and added many others based on our own analysis of the websites of some major US institutional repositories that we knew about. An examination of these websites had provided us with insight into the diversity of formats and types of materials that institutions are including in their repositories. One key result of our survey is that it seems clear a significant number of institutions are committed to institutional repositories that go far beyond e-prints. We can see that both in the kind of materials that are in repositories today and the kinds of materials that are planned for inclusion in the near future not only eprints and electronic theses and dissertations, but digitized special collections materials, multimedia, course materials, and datasets (Table I). Support for this conclusion is also provided by some of the information on software being used to support institutional repository work. Rather than the Southampton EPrints system or bepress, MIT's DSpace is the dominant package, and one also sees a lot of locally developed systems and use of various content management packages all of these suggest a desire for flexibility in the types of content being housed in the repository (though, in fairness, the bepress software did show a significant following; see below). Additionally, the sheer size of some of the repositories makes it clear that they are dealing with a very broad array of materials and not just with e-prints; there is no way one can fill tens of terabytes with institutional e-prints, no matter how prolific the faculty.
Repository Software Platforms
Of the 38 respondents to the question requesting information on the software the institution was using for its repository, 22 (58%) indicated that they were using DSpace. The next highest number was for bepress, with 8 institutions (21%). Other software mentioned, used by less than 5 respondents each, included Content DM, the Virginia Tech-developed ETD software, DigiTool, and locally developed systems. Some institutions reported using more than one type of software for their repositories or noted that they would soon change the type of software they were using. We did not attempt to collect software platform information from institutions in the planning phase.
Administrative Responsibility and Policy-Setting
It is clear that, in general, research libraries have the leadership role in operating institutional repositories, and also the leadership role in formulation of policy for such repositories. Of those institutions responding to the question about who has administrative responsibility for the institutional repository, close to 80% indicated that the library has the sole responsibility. A few institutions indicated that the responsibility was jointly held by the library and the information technology unit, library and instructional technology, library and academic administration, an archives unit, or some other multi-organizational arrangement. A smaller percentage, around 60%, indicated that the library had the sole responsibility for setting policy for the institutional repository. We would be cautious of the word "sole" here; while policy determination might technically fall to the library, it seems likely that the library in question employs some kind of advisory committee or consultative structure in formulating its policy. Other institutions indicated participation in policy decisions by faculty senates, departments or schools, the information technology unit, academic administration, and various combinations of those units.
We did not ask about financial responsibility; this would be an interesting area to explore in future, particularly as institutions move beyond the one-time arrangements such as grants or special allocations that are often being used to help fund development and start-up costs.
Federating Repositories and Deploying Cross-Repository Services
While the international country template included information about cross-repository services, we compiled this information from our own knowledge of current developments and through personal contact and conferences, rather than from the CNI survey. Our observations are that first, most institutional repositories want to be "good citizens" in the networked information landscape, and thus, for example, they support the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). While there are a few experimental services that are harvesting from institutional repositories (OAISTER at the University of Illinois - Urbana Champaign, or the efforts to export content from DSpace repositories to Google and other search engines), these efforts are still in very early stages, and it's not clear what organization(s) should take leadership in advancing them. Put another way, it appears that those institutions deploying institutional repositories are ready to become components in larger national or international-scale systems, but that developing these systems is not a local priority. This should be fertile ground for creative work by others.
It's worth noting that we did not uncover any evidence of operational federations (in the organizational sense) to replicate the contents of repositories (much less the ongoing curation of these contents) in a distributed fashion, though there are some experiments underway to look at technical as opposed to organization or economic issues, notably the integration of data grid technologies such as the San Diego Supercomputer Center's Storage Request Broker (SRB) with the DSpace software platform.
National Policies and Institutional Repositories
The relationship between national policies in areas such as access to publicly funded research results or evaluation of the productivity and quality of research institutions was of great interest as a topic for the international conference.
While the United States does not currently have policies explicitly directing or supporting the development of institutional repositories (the much-discussed National Institutes of Health request for deposit of NIH-funded research articles for open access relates primarily to PubMed Central, a disciplinary repository), it is important to note that there is a growing interest among research funding agencies in data management, curation and archiving that is not necessarily closely coupled to the open access debates. The National Institutes of Health put regulations in place in 2004 requiring a data management plan as part of all grants over $500,000 (US), and the National Science Board, which sets policy for the US National Science Foundation (NSF), has recently issued a report Long-lived Data Collections (National Science Board, 2005) that calls for a similar data management plan requirement to be part of the NSF grant process; it seems clear from discussions around drafts of the report that similar issues will likely come into play for research funded by other federal agencies (at least in the sciences, and perhaps ultimately beyond). Institutional repositories will, in our view, be an important vehicle for addressing these data curation obligations. We also note, anecdotally, considerable interest in institutional repositories in the context of public, state-supported institutions as a vehicle for public engagement and for communicating the intellectual and artistic contributions of the university to the people of the state; these have clear parallels to the national-level discussions taking place outside the United States about the role of the institutional repository in structuring information flow and communication between universities and the publics that support them.And, of course, open access remains on the policy agenda: future developments either in requirements for open access to publications based on publicly-funded research or for open access to research results and data directly could certainly influence the future development of institutional repositories.
Other Issues and Trends
As universities develop institutional repositories, the unit providing the infrastructure, most typically the library, needs to articulate a case for why faculty should deposit their materials within the repository. In contrast to some other national situations, in the United States use of institutional repositories is likely to be completely voluntary for faculty at virtually all institutions; faculty need to be persuaded about the benefits rather than intimidated by the consequences of not contributing to the repository. Those institutions that have made a concerted effort to understand their faculty needs and to reach out systematically to their faculty seem to have been more successful in attracting content for their repositories.
Because the outreach to faculty can be a slow, incremental, somewhat piecemeal process, some institutions begin populating their institutional repositories with the work of their students, rather than their faculty, as a quick means of acquiring a substantial body of a specific type of content. An electronic theses and dissertations (ETD) program is one such approach. In other cases, local records management needs may provide an important part of the initial impetus for an IR program. At still other institutions, strategies have included programs to ingest existing bodies of technical reports and other materials held at the department, laboratory or other organizational unit level rather than by individual faculty as a means. But whatever the strategy for initially populating the institutional repository, in the US, most IRs have been established because the institution's library took the leadership in establishing an IR infrastructure, e.g. the hardware, software, and institutional policies to support the IR.
Concerns about the scholarly communications system pushback against the high cost of scholarly journals, or support for the goals of the Open Access movement may also prompt increasing numbers of faculty to deposit the products of their research in institutional repositories. In the past year, a number of faculty senates have made statements or passed resolutions advocating open access policies, although generally they do not make specific reference to institutional repository strategies . There is some reason to believe that, at least for faculty publications, institutional norms (as promulgated by the academic senate) may increasingly encourage faculty to place their writings into institutional repositories. For the other products of e-scholarship and e-research the datasets, software, simulations, and related materials we believe that for the foreseeable future the case will still need to be made in terms of continuity, quality and consistency of access, preservation, curation and similar issues.
The responses to our survey also underscore the confusing relationships at many institutions among digital libraries, digital research collections and collections of materials in institutional repositories, and the ways in which all of these relate to the scholarly communications process. A number of respondents identified materials being accessioned into the institutional repository that we would have thought of as digital library collections. This is an area that demands careful and thoughtful analysis; we would speculate, tentatively, that a key distinguishing characteristic of digital collections and digital libraries is one of institutional rather than faculty-initiated accession and organizational efforts.
While the movement to establish institutional repositories in research-intensive US higher education institutions is still in its early stages, there are a few conclusions that we can draw, and which we think are important.
Institutional repositories are now clearly and broadly being recognized as essential infrastructure for scholarship in the digital world. This is evident based on the level of actual deployment and planned adoption within the research university sector. Consequently, it seems highly probable that the next few years will see growing connections between institutional repositories as infrastructure and the broader issues that are emerging about strategies and infrastructure necessary to support the management, dissemination and curation of research data (at the national, disciplinary and institutional levels).
It seems clear that, at least in the United States, institutional repositories are being positioned decisively as general-purpose infrastructure within the context of changing scholarly practice, within e-research and cyberinfrastructure, and in visions of the university in the digital age. Institutional repositories are not being deployed simply in response to concerns about the existing scholarly publishing system, the cost of journals, and the open access movement, although they certainly are being used to support agendas related to open access to the traditional scholarly literature.
Research libraries have taken on a leadership role in both policy formulation (including the framing of the necessary campus-wide conversations) and operational deployment roles for institutional repositories at our research universities. (We did not explore funding questions, which will be crucial going forward.) Clearly, while an institutional repository is recognized as an institutional service, library leadership is generally unquestioned; what varies from university to university is the extent of active collaboration by other campus units. Institutional repositories represent a critically important new policy and operational role for research libraries, and one that renews their connection with the core academic processes of the university.
 For a broad view of CNI's activities related to repositories, see the CNI 2004-2005 Program Plan available at <http://www.cni.org>. See also, for example: Lynch, Clifford. "Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age." ARL Bimonthly Report, No. 226, February, 2003, <http://www.arl.org/newsltr/226/ir.html>; "Institutional Repositories: A Workshop on Creating an Infrastructure for Faculty-Library Partnerships" <http://www.arl.org/IR_agenda.html>; "Institutional Repositories: What Does Your Institution Need to Know?" <http://www.educause.edu/LibraryDetailPage/666?ID=EDU0327>.
 See, for example: Cornell <http://www.library.cornell.edu/scholarlycomm/resolution.html> and University of Kansas <http://www.provost.ku.edu/policy/scholarly_information/scholarly_resolution.htm>.
National Science Board. Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century. Draft Report of the National Science Board. National Science Foundation, 2005. <http://www.nsf.gov/nsb/documents/2005/LLDDC_report.pdf>.
Pocket Guide to U.S. Higher Education 2005. (Compiled by EDUCAUSE)
Copyright © 2005 Clifford A. Lynch and Joan K. Lippincott