Issues in Federating Repositories: A Report on the First International CORDRA™ Workshop

Search | Back Issues | Author Index | Title Index | Contents

D-Lib Magazine
March 2005

Volume 11 Number 3

ISSN 1082-9873

Issues in Federating Repositories

A Report on the First International CORDRA™ Workshop

Wilbert Kraan
UK Centre for Educational Technology Interoperability Standards (CETIS)
<w.g.kraan@bangor.ac.uk>

Jon Mason
education.au limited
<jmason@educationau.edu.au>

	Abstract One in the IDEA Summer 2005 [1] series of e-learning technology interoperability events, the First International CORDRA Workshop brought together a range of communities with an interest in repositories. Because the Content Object Repository Discovery and Registration/Resolution Architecture (CORDRA) is essentially an agreed, scalable way to join up different repositories, it draws attention to a wide and diverse set of issues that are not necessarily limited to just the initiative itself. The workshop's purpose was to gather these issues and, where possible, address as many as possible. Introduction The First International CORDRA Workshop [2] was a joint initiative of Australia's Department of Education, Science and Training (DEST) [3] and the UK's Centre for Educational Technology Interoperability Standards (CETIS) [4], and, in the US, the Corporation for National Research Initiatives (CNRI) [5] and the architects of the CORDRA initiative, the Learning Systems Architecture Lab (LSAL) [6] at Carnegie-Mellon University. That roll-call already encompasses a technology infrastructure provider, a research and development team and two sectoral representatives, but the audience in Melbourne was wider still. From national policy-makers to institutional librarians, and technology vendors to corporate clients, any group with a stake in finding, managing and exposing learning content was represented. In all, eighty people participated in the workshop. In opening the workshop Neil McLean of DEST described 2004 in terms of being a "year of the repository"—and the task now was to look at the more difficult questions associated with building sustainable infrastructure. The CORDRA initiative has its roots in the US Department of Defense's Advanced Distributed Learning (ADL) [7] initiative, well known for the Sharable Content Object Reference Model (SCORM) [8] that has become the most widely used specification for e-learning content. ADL plans to complete the first CORDRA instance this year. Two further installations by other organisations are to follow shortly thereafter. Completion of the first batch of instances does not mean that the CORDRA model is done and dusted, however, and gathering requirements and comments to inform the next phase of development was one of the aims of the Melbourne meeting. Other aims were to provide an opportunity for a wider audience to learn about the initiative and to discuss the broader topic of federating repositories. CORDRA As outlined in the introduction by LSAL's Dan Rehak [9], CORDRA is a high level model that aims to improve access to learning by making content ubiquitous in ways that are determined by a particular community. For example, CORDRA relies upon persistent, resolvable, and actionable identifiers but does not otherwise require a specific schema. It assumes that much of the infrastructure is already there and that interoperability standards in the repository area are maturing. Yet in spite of this, we don't yet have the inter-repository interoperability to realise the desired ubiquity of content, and that is what CORDRA is designed to address. CORDRA's strategy is to address the interoperability issues associated with federating repositories by providing "an open, standards-based model for how to design and implement software systems for the purposes of discovery, sharing and reuse of learning content through the establishment of interoperable federations of learning content repositories" (Rehak & Blackmon, 2005) [9]. Key to this strategy is the recognition that repository interoperability must scale. In his presentation at the CORDRA workshop, Dan Rehak highlighted some of the salient lessons from history for CORDRA to emulate. Generally speaking, he argued, history has shown us that successful infrastructure (such as telephone and transportation infrastructures) evolves from local to global, grows in size and importance with demand by using scalable, reliable existing technology and relying on open connections with minimal interoperability standards. Such infrastructures operate from 'source to sink'; enable systems to develop differentiated as well as value-added services and can handle both peak demand and fractional uses. These historical systems required appropriate policies for use and governance, and trended toward ubiquity. In other words, CORDRA needs to "leverage what we know from history—knowing that we have not yet been successful." In this sense, CORDRA aims to be three things: A model for federating repositories A project that iteratively designs and documents the model A working infrastructure that includes multiple implementations, purpose-built tools, and the federation between implementations In a technical sense, the CORDRA model assumes a set of locally managed repositories, augmented by an object identifier infrastructure, common services and applications, and three kinds of repositories: a master catalog that registers the content held in the federation, a registry of participating repositories, and a system registry. Since implementations can vary in how all of these entities are instantiated, the system registry holds a full description of a particular CORDRA implementation. Common services can include such things as Authentication or a Digital Rights Management system. Applications are the tools used to operate the CORDRA implementation (Rehak & Blackmon, 2005) [9]. That's all very well, but for CORDRA to be successful, a number of workshop participants argued that the model must be able to present its value proposition to a broad constituency. Of these participants, James Dalziel of MELCOE [10], put it simply: "Why is CORDRA appreciably better than Google?" Liddy Neville of IMS Australia responded by noting that one reason might be that Google "doesn't do accessibility". CORDRA could address this issue because its identifier infrastructure is designed to make it possible to source appropriate versions of a resource on the fly, wherever it may reside, provided that a contributing system is aware of the accessibility preferences of a given user. Both Allyn Radford of HarvestRoad and Nigel Ward of the Learning Federation also argued that for CORDRA to work better than Google, there is a considerable onus on the managers of constituent repositories to put in place robust quality control mechanisms. Otherwise, very large repository federations could return as many useless hits as an average Google search. One community for which the CORDA case is clear is the training sector. Jack White and Trey Cooper of Boeing outlined some of the work they are doing on a "next generation performance support system" for a new class of destroyers for the British navy. One of the critical requirements of such a system is an ability to pinpoint the one relevant resource for a specific system in the ship, in its latest version, adapted to the competency set of the learner. The learning resource itself will also have to be persistent for the life of the ship, which might mean as long as thirty or more years. Such a use-case underscores the need for an underlying sustainable infrastructure that involves more than Google-like resource discovery. Also, as Dan Rehak pointed out, "Google doesn't treat learning content in any special way. It also doesn't see the deep web, doesn't index all it harvests, doesn't do versions, provenance or archives, all of which are needed in a case like this and many others." Boeing's Trey Cooper was able to provide a further use-case that appealed to many of the participants: he likes to know that the pilot who is flying the plane he just boarded has access to the latest training in real-time problem solving! Further to distilling that central value proposition, Eduwork's Robby Robson, who is also chair of the IEEE LTSC [11], wondered whether it would help to clarify the scope and purpose of CORDRA further. He thought the central question should be: "What is CORDRA going to do that is fundamentally different, and will make our lives/work easier as a result?" Given its current documentation, he was not sure whether it was primarily about offering management functionality for existing communities or building an infrastructure for everyone in the future. At the same time, Robson and a number of others felt that CORDRA's limitation of scope to accommodating (static, completed) learning objects was perhaps too narrow. Andrew Treloar of Monash University reported that a number of use-cases from the higher education communities of practice at the meeting involved managing a breadth of information that spans things other than "learning objects" (official documents, datasets, and research material, for example). While acknowledging this issue as something that CORDRA needs to solve in the longer term, Dan Rehak commented that there is no real constraint on the kinds of objects a CORDRA implementation can deal with, but from a practical perspective he considered it strategic to limit the human scope of the project in its initial stage. On the more technical side, the presence of quite a few implementers of federated search mechanisms testified to the fact that building interoperability between repositories is not new, and, as Kerry Blinco from IMS Australia pointed out, libraries have been dealing with issues of scale and repository interoperability for a considerable time. Discussion of this issue drew many comments from the implementer community, covering three broad categories: the need for simplicity, the merits of harvesting over direct search, and matters of control and policy. The concern for simplicity is to some extent related to the question of the value proposition, but also concerns the overheads of system implementation. The simpler the systems are, the more likely it will be that communities will implement them, and therefore the more likely for the model to reach critical mass. Both LSAL's Dan Rehak and Lorna Campbell from CETIS [4] conceded that the message to various audiences could do with some simplification, which is one of the reasons why the workshop was set up. Furthermore, a simple user interface is more important than the simplicity of the infrastructure behind it. CORDRA's choice for an infrastructure that relies on the federation of metadata raises a number of questions that have been raised before, but are still pertinent to CORDRA itself. One question is whether querying is better than harvesting. Erik Duval of the K.U. Leuven and the Ariadne Knowledgepool presented an overview of GLOBE [12], a new federation that relies on querying. The GLOBE federation works by putting an agreed "wrapper"—Simple Query Interface (SQI) [13]—around the search interfaces the constituent repositories already have. Though that works very well right now for the five GLOBE partners, Kerry Blinco of DEST pointed out that the technique may not scale very well over larger numbers of repositories. Since GLOBE's scope is limited to providing operational interoperability among the five partners right now, this is not a major problem for the GLOBE project. However, Erik Duval identified an important tension that exists across many current e-learning infrastructure initiatives: balancing "top-down by design" with "bottom-up by evolution". Another question raised by the choice for harvesting is that of control and accountability. For example, MERLOT's Martin Koning-Bastiaan had some concerns about the fact that MERLOT holds metadata about learning resources, but does not hold the resources themselves. Just giving away all the metadata to a higher level CORDRA instance would be problematic, since that would mean giving away the whole value-add of MERLOT. Similarly, Marek Hatala of Simon Frasier University was concerned about what kinds of control a repository owner would have to give up in practice when joining a CORDRA instance. The Splash and LionShare personal, peer-to-peer repositories on which he has worked rely heavily on the fact that their owners retain control over what gets exposed to whom. Nonetheless, fellow LionShare developer Mike Halm from Penn State University argued that there is a need to include these kinds of repositories if one wants to provide any access to the large amounts of resources presently stored on people's PCs. Such issues have been identified by the CORDRA designers at the outset. Recognising that communities can and will have specific requirements, CORDRA has been designed to let particular communities make these decisions, according to Rehak. Such a strategy is also designed to help with the implementability of a system, since even the choice of object identifier scheme is something for the community to decide. Though the ADL implementation will be built around CNRI's Handle System® [14], it is recognised that de jure standards in this and related areas are not yet fully de facto. For example, even though there seems little doubt that most communities will go for OAI PMH [15] for the harvesting of metadata, given its successful uptake worldwide, different options are available. Reporting for the content providers community of practice parallel session, Sandy Britain of Tairawhiti Polytechnic in New Zealand saw the potential of this flexibility, but wondered whether it might lead to a "CORDRA bind": the larger the CORDRA instance, or the higher up a hierarchy of CORDRA federations, the harder it becomes to reach satisfactory agreement between all stakeholders on these technology choices and policy decisions. This is not a major issue for the ADL CORDRA instance, for the simple reason that the US Department of Defense has issued a decree that, as of this year, all e-training materials shall be SCORM compliant and accessible in searchable repositories. No new content shall be developed before re-useable content has been examined [9]. For other communities there is, at least, the possibility of determining access rules at each level of a hierarchical federation of federations, but that still does not make the political job of negotiation disappear. Repository federations generally The aspect of control and responsibility is something that a model like CORDRA can only partially do something about. Many more such generic issues, however, are opened up by the very possibility of having widely federated repositories. Easily the biggest, and currently most controversial, of these is the question of Digital Rights Management (DRM). JISC's Sarah Porter [16] reported that the national policy-makers in their parallel session considered this mostly to be a problem for the commercial content vendors, but the policy-makers recognised that there were further implications. Indeed, James Dalziel recounted some rather interesting use-cases on which he is working in MELCOE's [10] Meta Access Management System (MAMS) project [17] One involves the exposure of some Australian Aborigine artifacts that have traditional access restrictions based on a combination of kinship, tribal affiliation and gender. Another involves the exposure of certain types of medical content; in some jurisdictions, even the knowledge that such research exists can be dangerous to both the researcher and the patients. Consequently, access restrictions would have to be applied even to the metadata. The general issue of DRM is made even thornier by the fact that some large IT vendors are locked in an epic battle over who gets to supply the basic technical building blocks for a global DRM solution. One of the unfortunate side-effects of that ongoing wrangle is that important areas such as Digital Rights Expression Languages (DREL) are subject to untested patents of very wide scope and uncertain licensing conditions. Yet some sort of solution to the general question of who gets to access what, and under which conditions, was felt to be necessary, because it has a direct bearing on a secondary issue that reappeared many times in the workshop discussion: whether (higher education) academics are fundamentally willing to share learning materials. Maxine Brodie, university librarian for Macquarie University [18], reported that the Higher Education (HE) parallel session fully acknowledged introducing a culture of sharing learning materials was a matter for the HE institutions themselves. Evan Arthur of DEST thought that academics, in that sense, are a little schizophrenic. On one hand, they want to share their research as far and wide as possible, but on the other hand, they appear unwilling to share learning materials even as far as the next door down the hall. The key question, he felt, is how can you set things up so there is both an ego reward and a formal reward for learning output to be shared? Mike Halm argued that part of the answer could lie in shifting the focus from abstract access rights associated with a resource, to letting people decide for themselves who has access to the collection of resources under their control. The central idea here is that people are willing to share in self-selected communities. Focussing on the community building aspect could also neatly side-step some of the more tangled patent issues in the access control area. Yet the question of control and responsibility goes further than regulating access to the resources. For Adrienne Kebbell of the National Library of New Zealand (NLNZ) [19], and many others, governing a repository federation or federations, CORDRA or not, raises issues such as who guarantees authenticity, who is responsible for tasks such as archiving and quality control, and how can such federations be made sustainable? The federated search implementers went one step further on the last point and concluded that for initiatives like CORDRA to succeed in its goal of global ubiquity, it needs to be open, international, and funded. Dan Rehak acknowledged that the issue of sustainability was the subject of one the main ongoing debates in the ADL instance project. Moreover, he explained, most of the effort in that project was actually spent on issues of policy and rule setting than on matters of technical implementation. The question of governance of the CORDRA model itself is no exception in this regard, and various ways are being explored to open the initiative up to the wider community. While CORDRA was the main focus of the workshop, a number of related initiatives were presented that helped facilitate an exchange of ideas and an opportunity to build a bigger picture. Among these was the ELF (E-Learning Framework [20]) project, another work-in-progress that involves representatives from DEST, JISC-CETIS, and LSAL. Unlike CORDRA, ELF doesn't aim to deliver specifications itself. Rather, ELF aims at providing both a common vocabulary and a roadmap for the development of the component services that need to be present in an evolving e-learning infrastructure. Such work helps clarify the broader context for CORDRA and is complementary to it. Likewise, CORDRA also informs ELF. One important difference between CORDRA and ELF is that, while CORDRA assumes there are sufficient standards for its purposes, ELF exposes the fact that standardisation in e-learning has only just begun. David Massart, from the European SchoolNet, echoed this from the perspective of an implementer trying to deliver a narrowly scoped service. Over the course of two days, the workshop raised many more issues, with many participants diving straight into the details of the possibilities that the CORDRA model opens up. The identifier resolution pattern, for example, makes it generally possible for schools with poor bandwidth to cache copies of resources locally, but how would that work in detail? Versioning of resources is also possible, but how would you actually trace the constituent parts of objects back to their original aggregations? Managing context in metadata to an acceptable degree is another complex challenge, as is matching content with learning objectives. Then there's the issue of dealing with all the legacy content, much of it without sufficient metadata and some with its own, dependent infrastructure. Achieving a common purpose between CORDRA adopters is one challenge, but providing tools to those communities of practice that sit on the perimeter is an even bigger one. Moreover, moving forward with CORDRA will necessarily bring in its wake big challenges for business, and policy development for an increasingly widening constituency. Conclusion The First International CORDRA Workshop demonstrated two things very clearly: the sheer volume of interest in an interoperable repository federation model among all stakeholders, and the wisdom of designing CORDRA in a modular fashion. As a reference model and a means for implementing federations of searchable repositories, CORDRA promises big things. The range of people present and their willingness to engage in the often abstract and complex issues of the field testifies to the fact that the need for solutions in this domain is keenly felt. As the organisers intended, there is now a much richer view of what those solutions need to entail than there was before the event. While the "willingness to engage" is itself a useful outcome of the CORDRA workshop, it needs to translate into well-articulated case-studies, and any technical standards need to be complemented with guidelines. Defining a set of business rules about how to share resources across diverse repositories is a key issue that the CORDRA work has brought into clear focus. Harnessing this intent, and building a community around such a common purpose, is indeed as crucial as getting the technical specifications right. The modular nature of the CORDRA model vindicated itself under close scrutiny as well. Most issues, in one way or another, have come down to either the needs or preferences of a particular community, or simply have not been solved to universal satisfaction yet. The pattern nature of CORDRA means that each community has plenty of opportunity to determine its own practices, rather than be straight-jacketed into a more prescriptive set of standards. CORDRA's modular nature means that the general problem of achieving repository interoperability can be broken down in more manageable chunks. Consequently, Dan Rehak's summary emphasized that CORDRA will not be finished for a while yet. But critical to its success will be to determine the minimal pieces that make CORDRA technically feasible. Many more issues will arise over time, important among them the issue of stewardship. For now, the challenge is to build some initial implementations. As practice grows in multiple communities, different solutions to each of the issues can be found and disseminated. For that reason, the workshop's organisers consider it vital for the dialogue that started in Melbourne to continue, and therefore ways of keeping it going will be found [21]. References [1] IDEA Summer Events 2005, <http://standards.edna.edu.au/idea>. [2] First International CORDRA™ Workshop, <http://cordra.net/cordra/calendar/events/workshop20050204/>. [3] Australian Government Department of Education, Science and Training, <http://www.dest.gov.au/>. [4] Centre for Educational Technology Interoperability Standards, <http://www.cetis.ac.uk/>. [5] Corporation for National Research Initiatives, <http://www.cnri.reston.va.us/>. [6] Carnegie-Mellon Learning Systems Architecture Lab (LSAL), <http://lsal.org/>. [7] Advanced Distributed Learning (ADL), <http://www.adlnet.org/>. [8] Sharable Content Object Reference Model (SCORM), <http://www.adlnet.org/index.cfm?fuseaction=scormabt>. [9] An Introduction to CORDRA (2005) PowerPoint Daniel R. Rehak, William Blackmon, LSAL, <http://cordra.net/cordra/information/presentations/2005/intro/intro20050204.ppt>. [10] Macquarie University's E-learning Centre of Excellence (MELCOE), <http://www.melcoe.mq.edu.au/>. [11] IEEE Learning Technology Standards Committee (LTSC), <http://ltsc.ieee.org/>. [12] Global Learning Objects Brokered Exchange (GLOBE), <http://taste.merlot.org/initiatives/globe.htm>. [13] Simple Query Interface (SQI), <http://www.prolearn-project.org/lori>. [14] Handle System® home page, <http://www.handle.net>. [15] OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting, <http://www.openarchives.org/OAI/openarchivesprotocol.html>. [16] Joint Information Systems Committee (JISC), <http://www.jisc.ac.uk>. [17] Meta Access Management System (MAMS), <http://www.melcoe.mq.edu.au/projects/MAMS/>. [18]Macquarie University, <http://www.mq.edu.au>. [19] The National Library of New Zealand (NLNZ), <http://www.natlib.govt.nz>. [20] ELF (E-Learning Framework, project, <http://www.jisc.ac.uk/index.cfm?name=elearning_framework>. [21] Presentations from the CORDRA workshop can be found at <http://cordra.net/cordra/calendar/events/workshop20050204/positionpapers.php>. (At the request of the author, on March 17, 2005, the URL in Reference 12 was changed, and on March 28, 2005, the name of the participant from the National Library of New Zealand was corrected to read Adrienne Kebbell.) Copyright © 2005 Wilbert Kraan and Jon Mason

	Top \| Contents Search \| Author Index \| Title Index \| Back Issues Previous Article \| Next Conference Report Home \| E-mail the Editor

	D-Lib Magazine Access Terms and Conditions doi:10.1045/march2005-kraan