Theo van Veen
On March 3, 2006, the Koninklijke Bibliotheek (KB) and the SRU Implementers Group held a workshop in The Hague with the theme "Integration of Services; Integration of Standards", following a two-day SRU (Search and Retrieval via URL) Implementers Group meeting. The purpose of the workshop was to hear about and discuss what should be the next steps to improve integration of services and applications, with the main focus on integration with or via SRU. Two mechanisms were discussed: (1) protocols that have been standardised or formalised to some degree (for example, SRU, OpenURL and OAI) and (2) other services that might benefit from the standards or could be used in conjunction with the standards.
The reason for organizing this workshop as an extension of the SRU meeting was threefold:
It could not be expected, however, that directly at the end of the workshop there would be a clear vision on the next steps to be taken, because integration of services as was discussed at the workshop is quite new. In general, a workshop such as this generates discussions and thoughts in the minds of the participants that become clearer after time for reflection. Therefore, besides describing the individual presentations, this workshop report also includes the thoughts of the presenters that arose after the workshop. It is expected that this will help to provide insight into the issues playing a role in service integration. In addition, this workshop report also includes a brief report by Ray Denenberg on the SRU Implementers Group meeting (see Appendix 1).
The subjects of the Integration of services Integration of standards workshop presentations were balanced such that they covered the different stages in the process of 1) finding resources, 2) searching in resources, and then 3) linking to related or specific resources. These subjects were:
(The acronyms used above will be explained in a glossary (see Appendix 2) at the end of this report.)
Summary of the Presentations
Below are the summaries of the different presentations. The powerpoint and HTML presentations are at: <http://www.loc.gov/standards/sru/march06-meeting/presentations.html>.
CQL (Mike Taylor, Index Data)
Specifications such as OpenSearch facilitate interoperability by providing standardised syntax for searching. But the higher goal of interoperability at the semantic level further requires a common means of expressing rich queries. CQL (the Common Query Language of SRU) provides this.
OpenURL and COinS (Ross MacIntyre, MIMAS)
The OpenURL Framework for Context-Sensitive Services has now been endorsed as a NISO standard (Z39.88-2004). This new standard has broadened the potential scope of OpenURL implementation beyond the scholarly information community, with the possibility of extension by registration of new formats and profiles for new domains as well as the introduction of an XML format. Furthermore, the OpenURL Framework has separated the details of the reference and its context, known as the ContextObject, from the means of transporting it across the network, which is the OpenURL. This separation enables use of the ContextObject within other applications. For example, if a ContextObject were to be embedded in a web page, other applications, such as web browser extensions (e.g., Openly's OpenURL Referrer), could provide extra functionality. This has led to the recent development of the COinS ('ContextObject in SPANs') specification, which embeds a ContextObject within an HTML 'span' element.
IESR: A Registry of Collections and Services (Ann Apps, MIMAS)
The UK JISC Information Environment Service Registry (IESR) (http://iesr.ac.uk) publicises collections of resources, along with details of services that provide to access them, in a machine-readable format, and also provides standalone 'transactional' services. It is a central registry, a middleware shared service intended primarily for machine-to-machine use, within the architecture of the Information Environment. Apps explained how the content of IESR is described, based on metadata standards, giving some examples of current descriptions. The IESR API provides access to the records in IESR via several interfaces. She indicated some possible ways in which IESR could be used. A dynamic portal could discover, then provide an SRU metasearch over, collections appropriate to an end-user, without the need for manual intervention to build resources into the portal. This potentially would widen the user's landscape of useful knowledge. Use of IESR descriptions by harvesting, or by human discovery preparatory to manually plugging a resource into an application, is also expected. Apps finished by indicating future envisaged developments for sharing records across distributed service registries, and she also mentioned some integration issues that have arisen during the development of IESR.
Formal Descriptions of Non-standardised Services (Theo van Veen, Koninklijke Bibliotheek)
At KB we are exploring how to formalise the description of different kinds of services. The purpose is to let the presence of metadata terms in the output of one service trigger another service and use those metadata terms as input for that other service. A user agent will interpret these service descriptions and offer the user the functionality to link to other services using the metadata in a previous result as input. As a proof of concept, this has been demonstrated by means of an SRU client running in the browser and containing a user agent. Clicking on, for example, a creator field in the SRU response generates links to several services with the creator as input. The user may specify another file with service descriptions to control which services are offered for different metadata terms. It is expected that these service descriptions, when formalised, can be exchanged easily and may be useful for other applications as well.
Shibboleth and SRU (John Paschoud, PERSEUS Project)
Subtitled "You can't have everything you want, but the Web should know what you can have", this presentation examined possible interactions and mutual benefits between SRU and Shibboleth (for access management of non-public, Web-based resources). Paschoud explained how Shibboleth works over HTTP, and explored the questions:
SRU Record Update (Janifer Gatenby, OCLC Pica)
The presentation by Gatenby covered various aspects of SRU Record Update, indicating its niche as an interactive protocol alongside other mechanisms such as the OAI PMH push mechanism and batch loading. Interaction scenarios with SRU/SRW were examined and the current development between OCLC and OCLC PICA was covered.
OpenSearch, SRU and Google/Widgets: Database Considerations and Experience (Derek Lane, CSC)
EIMS is a catalog for EPA work products, projects and data. Providing access to catalog records in an efficient and accessible fashion has required us to track emerging standards for web-based search and provide commonly accepted simple xml representations. OpenSearch, A9's evolving standard for describing simple searches, has fit into existing RSS work easily. Lane described OpenSearch's new extensions for multi-field search and compared the technical and implementation properties of SRU and Opensearch.
Library 2.0 (Ian Davis, Talis)
Fundamental to the concept of Library 2.0 is the shift from delivery of library services solely within the library building, or via the library's own web site, towards the embedding of discrete library functions within a range of contexts. This presentation and demonstration illustrated how providing library services using SRU and related technologies can sustain an ecosystem of new and innovative applications.
D+ : A Common Server for SRU and OpenURL (Robin Taylor, Edinburgh University Library)
D+ is a software framework that brokers the searching of resources in distributed repositories. It is based on, and extends the open source SRW/U Server developed at OCLC. In addition, the server also acts as an 'OpenURL friendly' target by supporting queries conforming to version 0.1 of the OpenURL standard, rather than CQL. Taylor's presentation demonstrated the use of both query types in the context of a resource list application using D+ as the search web service.
Metasearch and SRU: MXG, the Metasearch XML Gateway (Ray Denenberg, Library of Congress)
NISO defines metasearch as "search and retrieval spanning multiple databases, sources, platforms, protocols, and vendors at one time." It cites the problem as follows: "Current systems require users to know how to select, access and search specific databases", and the goal is: "To create an environment that helps users find what they are seeking while minimizing what they need to know".
In a more detailed elaboration, NISO attributes goals to meatasearch entities as follows:
Of these entities, the main focus is on interaction between the metasearch engine and content provider, rather than between the library and metasearcher. The NISO Metasearch initiative has been charged with identifying/developing standards/best practices to improve interoperability between metasearch engines and content providers, and identifying a simple search/retrieve protocol to help database providers more effectively interoperate with metasearching applications.
As part of the latter charge, task group 3 of the metasearch initiative has been charged with evaluating SRU for suitability as a protocol between metasearcher and content provider. As part of this process, the MXG (Metasearch XML Gateway) specification has been developed, for communication between metasearcher and content provider, based on SRU and CQL.
WSDL, UDDI, SOAP, REST: SOA Acronym Soup (Matthew Dovey, Oxford University)
There has been a lot of activity on Web Services and now Service Oriented Architectures over the past five years. Neither of these terms is particularly well defined. "Web Services" might be SOAP-based or REST-based; the OASIS definition of Service Oriented Architectures could also describe CORBA and DCOM. REST itself is often vague as to its meaning (e.g., SRU whilst often described as REST is really only REST-Like!). Attempts such as the Web Service Interoperability Profile have attempted to rectify some of the interoperability issues surrounding Web Services (particularly in the Web Service Description Language), but issues remain, especially as you move higher up the Web Service stack (UDDI, WS-Addressing, etc.). This presentation described what all these acronyms mean and which ones are "safe" or "risky" from an interoperability perspective.
Use of ZeeRex (Z39.92) to Describe Search and Retrieve Services (Robert Sanderson, University of Liverpool)
ZeeRex is an XML schema developed over the last 5 years to describe the semantics of a service that supports retrieval, and typically search. It models only the interactions, not the protocol's syntax (which is left up to ASN.1, WSDL or similar), and hence can be used to describe different methods of doing the same thing issuing a search and retrieving matching records. It could be used to describe, for example, Z39.50, SRU, OpenSearch, OAI and even such things as FTP, not typically thought of as a type of information retrieval protocol. This presentation primarily discussed modifications from the current SRU version ZeeRex 2.0 to the standardised Z39.92 and the advantages these modifications provide.
Conclusions and Thoughts Generated after the Workshop
Below are thoughts contributed by some of the workshop presenters after they had time to reflect on the issues discussed at workshop. This is perhaps one of the most interesting results of the workshop, and it is hoped that the views expressed will contribute to new ways of integrating services and information to the user.
The information environment includes a diverse range of technologies. Attempts to persuade people to converge on a single service protocol are likely to be futile. Activities aimed at encouraging interoperation between services of different types would seem a better use of effort. The ability to discover within a registry a wide range of resources and their service connection details should assist in the eventual integration of different service protocols within a general service oriented architecture.
Context management (loss of default technology implementation; loss of http sessions) were echoed in the discussion. For example, record ids allow one to ignore http sessions; xpath for sort is expensive if you have to do it at the boundary of a large system with non-xml internal structure; base URLs maintain context for resolving relative urls outside of http sessions; hit counts for subqueries are cheap for post-indexes but insanely expensive for SQL. As SRU is deployed in new environments I expect to see more issues of this type appear. OpenSearch integration seems possible, at least so far. There is a profile of CQL that can be mapped to OpenSearch, so that an OpenSearch implementor (of which I expect there to be many) can have a simple implementation story: use the generic CQL to OpenSearch mapping and edit the code for formatting results one more time. This kind of implementation is attractive, but will not last over the years without some level of coordination between SRU and OpenSearch.
It is also possible to add capability to SRU servers to speak OpenSearch. This could be worthwhile for sophisticated SRU servers to do (it increases the number of clients for existing implementations), but it will not directly affect the number of SRU servers available.
It would be interesting to explore the utility of SRU outside of its traditional focus, perhaps as a search interface to some of the larger community publishing or discussion sites. Aside from RSS and OpenSearch, there appears to be no standard machine interface for searching these types of content. One tactic could be to donate an SRU interface to a selected few of the open source CMSs. I also think more work has to be undertaken around the area of security and identity, and I was very interested to see the presentation of Shibboleth. I would like to see how SRU can fit better with these kinds of mechanisms and also with future efforts such as the IETF DIX work.
Overall I was pleased to learn that there are few interoperability issues between different implementations of the standard, and there even seems to be some scope for interoperating with other protocols that address the same space. Future work should seek to preserve and build on this level of interoperability, and it should feature strongly in the prioritisation of new features.
The exact mechanics of SRU itself are not fundamentally important, the wisdom gained over 20 years of Z39.50 and SRU is what is important to ensure is not lost. The related semantic aspects (ZeeRex and CQL) are usable outside of the SRU protocol and this can be encouraged without fear of somehow lessening SRU. Quite the opposite in fact; syntactic interoperability is comparatively easy whereas ensuring that communities have access to the same semantics is the challenge about which the library world has years of useful experience to share.
Theo van Veen
With Opensearch on one end of the spectrum, SRU on the other end, and MXG in the middle, we might end up with clients and services that need to support more than one protocol, and that is certainly not an ideal situation. The problem seems to be related to the support of CQL. Clients should be able to recognize that a service does not support CQL from the explain record (or in the worst case from the absence of an explain record). SRU services should recognize queries that are not CQL and should therefore be treated just as a list of terms. In this way queries that are not CQL queries can be broadcasted to all services, even those that don't support CQL.
Another interesting item is the relation SRU has to Shibboleth. SRU responses, explain as well as seachRetrieve, might depend on the credentials obtained from authentication services. An interesting question is whether SRU could carry authentication information by using SRU's extension mechanism. For example, a requested service has to redirect the user to a "where you from" service to obtain the address of the authentication service of the user's institution. When the requested service is an SRU service and when authentication information is available to the SRU client, it can be passed directly to the SRU service as extra parameters. Especially when a local SRU client is being used, it is convenient that the SRU client can rely on XML responses rather than being redirected to another page for authentication purposes.
Appendix 1: Report of the SRU Implementer Group Meeting (Ray Denenberg)
The SRU Implementer Group meeting preceding the Integration of Services; Integration of Standards workshop was very fruitful.
There will be a much-needed bibliographic index set developed, based on MODS semantics. There will be an OpenURL profile, which will prescribe a mapping from these bibliographic indexes to OpenURL keys. The profile may also specify how an SRU response can facilitate the client process of formulating an OpenURL: An SRU client receives a record and wants to create an OpenURL where the object described by that record will be the referent. The client could request the record for that item in the appropriate OpenURL metadata format, which could then be used directly as the context object.
The "sort" proposal was accepted. It is felt to be a major improvement over the way sorting is done in SRU 1.1.
SRU via Post is defined. SRU now has three forms: (1) via URL (as originally), (2) CQL (currently known as the "Common Query Language" will instead be the "Contextual Query Language), and (3) SRU over SOAP (formerly SRW, and the SRW acronym will be dropped).
There is progress towards aligning SRU and OpenSearch. The strategy we discussed is to make OpenSearch requests legitimate SRU requests. (See: <http://www.loc.gov/standards/sru/march06-meeting/report.html>.) Then an SRU-friendly OS server will be able to do something intelligent when it gets an SRU-loaded OS request. There are clear advantages of SRU over OpenSearch: CQL, schemas, scan, and diagnostics.
An OAI over SRU profile will be defined. It will specify that a server support three indexes necessary for OAI: identifier, last modification date, and collection identifier.A basic agreement was reached on how to incorporate various bits of information in a request or response. This would include hit and term counts returned by the server. A client will be able to indicate that it does not care whether or not the server includes the record count in the response. The reason for this is the concern that in some environments, counting the records accurately is expensive. A diagnostic will be defined to indicate that the reported record count is approximate.
The basic standardization plan presented was approved in principal, to take SRU to OASIS. The philosophical basis for this decision is as follows: The world clearly needs a single, well-defined, powerful protocol for searching by URL with results returned in XML. Competing protocols are being developed; one of these will drive this standardization effort if SRU does not, and if so, it won't meet our needs. We conclude that SRU needs to drive this effort, and needs to involve the other interested communities. It follows that SRU standardization needs to occur in a mainstream standards body. OASIS is probably the only mainstream standards body whose scope covers SRU.
OASIS is a neutral ground for merging competing de facto standards into an industry standard. It uses a lightweight process to promote industry consensus and unite disparate efforts.
We would first form a public discussion list to determine whether to form an OASIS Technical Committee, based on the likelihood that a standard would actually emerge from an OASIS TC: whether there are there intrinsic, insurmountable differences of opinion; and whether other parties (A9, etc.) will participate. The discussion would also seek to determine how much change input from other parties will introduce, and how long it will take to get to a committee draft (the version prior to public comment and a vote of all OASIS members).
The public list process might take roughly 3 months, the technical committee, six months, and then it might take another three month for a standard to emerge. We will likely first formalize the "easy" changes into SRU version 1.2. and take the more complex problems into the standardization process. The result of the OASIS standardization process would be version 2.0.
Included for standardization along with SRU would be CQL, Scan, the Explain Operation (but not the Explain specification itself), and mappings: SRU over SOAP (i.e., SRW), and SRU Post.
Appendix 2: Acronyms Used in this Workshop Report
Copyright © 2006 Theo van Veen