James R. Davis
Judith L. Klavans
Center for Research on Information Access
D-Lib Magazine, June 1997
In a hallway meeting at the ACM digital libraries conference in March 1996, the two of us were discussing how the absence of adequate protections for intellectual property was inhibiting the growth of digital libraries. We both felt reasonably sure that something would emerge from the marketplace, given the powerful forces that desire intellectual property control, but we were less sure that something useful for digital libraries would emerge. We imagined five or six incompatible protection schemes emerging, each with its own idiosyncratic procedures and concepts that we would have to explain to our patrons. Surely, we reasoned, computing's strategy of abstraction should allow us to define a high level representation for terms and conditions, independent of any particular form of enforcement-- indeed, independent of any particular legal regime -- that would allow us to build our libraries without concern for what might eventually win out in the market.
The example of the Metadata Workshops suggested a process of collecting general requirements and principles, postponing details, followed by movement towards more definite proposals. With this model in mind, and with funding from the National Science Foundation, we held a workshop on September 24th through 26th, 1996. We invited 30 experts from the relevant fields: law, publishing, libraries, economics, and policy as well as technology. We agreed to rule out of scope several potentially contentious areas (criticism of current or planned US law in cryptography or intellectual property; discussion of whether "information should be free", and specific mechanisms for enforcement of terms and conditions or payment) so that we could concentrate on topics where we felt we could be effective, in the hopes of defining a clear research agenda for the next few years.
A brief summary appeared on the October 1996 issue of this magazine. In this article, we discuss the major findings; for the complete report, which includes interpreted transcripts of sessions, summaries of break-out sessions, and follow-up material, see the Workshop Report which was published as part of the NSF-Digital Libraries Initiative Workshop Series, Number 7. Additional information can be found at the Workshop home page.
What are the general requirements for a language for terms and conditions? What functions must it represent? What are the required syntactic components, and what is the semantic interpretation of these components? A minimal "language" would be a set of conventions to establish where and how to attach labels to digital objects. At the very least, it must inform users of a digital library what they can and cannot do with a digital object. Such labels might be statements in plain text or they could be machine interpretable. However, a more effective language would actively help patrons to comply to terms and conditions of use and payment, e.g. by facilitating payments or establishing credentials, on the view that most people will comply with the law if it is not too difficult to do so. For example, if a faculty member can be confident that students will have no difficulty establishing their right to access an electronic reserve reading room, they will be less likely to resort to bootleg photocopying. A still more expressive language will allow software agents to act on behalf of library patrons, e.g. to choose whether and in which manner patrons will access materials.
Workshop participants identified a number of requirements for a language, which we divide into categories of technical, legal, and social. (These divisions are somewhat forced, as there is a great deal of interaction among them.) In some instances, these requirements are not features that a language must possess but take the form of obstacles to be overcome before a technology for terms and conditions can be created and deployed.
Most of the technical requirements we enumerated are the same as those for other systems intended for broad use by a large community, e.g., that it be easy to install, simple to use, able to connect to legacy systems, flexible, extensible, inexpensive, and so on. We will not mention these here since they are assumed to be indispensable for all formulations of technical requirements.
Legal experts provided an unexpected requirement, that the language be able to represent ambiguity and vagueness. This surprised the organizers at least, as we had imagined that the law sought precision. However, it appears that the legal community prefers to leave room for interpretation in definitions, meaning that any language for terms and conditions will have to accommodate for intentional ambiguity and vagueness.
This need for ambiguity leads to a closely related issue, the danger of mechanical enforcement of terms and conditions. In fact, there are two related yet distinct ways that a language for terms and conditions might be used. First, it might inform users of their rights and obligations in a transaction. This information might simply be displayed, or might be provided to software agents acting on users' behalf. The second form of use occurs when terms and conditions constitute input into a system that mediates all access to an object, i.e. determines absolute control over access and is thus not subject to user control. The latter form, if it is to be executed by a machine ("mechanical enforcement") requires unambiguous definitions.
The connection with the need for ambiguity comes through the concept of fair use, which has no precise definition. Indeed, fair use is often determined by interpretation in context. If fair use cannot -- and indeed, must not -- be defined precisely, then no purely mechanical system of enforcement can be formulated that will permit full compliance on conditions. If access to on-line material is only possible through mechanical enforcement systems, then fair use will become literally impossible. The tension between the need for ambiguity and vagueness and the need for explicitness creates a challenge in the design of languages for terms and conditions.
The interface between contract and copyright law is presently not well-defined. This is particularly important for developing terms and conditions, because the terms and conditions are contractual although they are often enforced through the power of copyright. If the licensee exceeds what the license allows, and the use transgresses one of the copyright owner's exclusive rights, the copyright holder can pursue the licensee with a copyright claim instead of contract claim.
Some distributed digital libraries are already international, and even a regional library on the Web might be used from anywhere in the world. It is therefore necessary to understand legal systems and concepts as they are imposed globally. For example, European copyright law awards authors so-called "moral rights", which are generally non-assignable, except to heirs. Moral rights pertain to individual authors, rather than publishers, and guarantee that action can be taken to prevent illegal reutilization of materials. Unlike other rights, depending on the national law, they may not be waivable. Furthermore, a number of uncertainties in the regulatory environment impede electronic commerce in general, and digital libraries in particular. It is unclear which authorities have the right to tax transaction in cyberspace, nor who has jurisdiction over which transactions. Political pressures to regulate different kinds of content vary within and between countries. The goal of recent legislative efforts is to establish some minimal norms which apply at the international level.
We also identified needs for consumer protection. At the least, consumer business protection laws must migrate to the on-line world. We noticed that nearly all the discussion of terms and conditions, both in the workshop, and outside it, concerned protecting the property owner's rights, but that little attention had been paid to the symmetric problem of the user's right, for example, the right to privacy. Librarians have a long tradition, for example, of protecting patron's circulation records. We identified a corresponding need to protect the browsing, borrowing and purchasing records of users of digital libraries. As an example, consider the recent controversies about the use (and abuse) of Web browser "cookies". A "cookie" is a mechanism by which server side operations (such as CGI scripts) can store and retrieve information on the client side of the connection. Thus, information submitted by a web browser to a web server via an interactive method can be stored and used. Typical uses of this data are for customer tracking and user surveys. It is unclear how to enforce the protection of browsing rights, i.e. to ensure the user of the digital library has the same browsing privacy rights as in the traditional library, or to permit the purchaser of digital information in the digital library the right to negotiate. As a second example, while there is considerable effort to establish "non-reputiable" object delivery (which make it impossible for a customer to deny having placed an order or received goods) there has been no attention, so far as we know, to the symmetric problem of providing the customer with non-deniable terms and conditions. Imagine that terms and conditions become highly dynamic objects. A future customer should be able to show that he or she did have the rights he or she expected to have, at the time of purchase, even if the vendor subsequently alters them for other purchasers.
Policy for subsequent use is unclear. Publishers vary in their approaches to giving electronic access; some are willing to negotiate and others are firmly holding ownership in all formats. The situation faced by libraries is forcing the formation of licensing consortia which is in turn affecting the decisions of publishers. Licensing variations create challenges for the explicit formulation of a language for terms and conditions.
A need for trusted third parties was discussed for a variety of roles. First, the language standard must be open and non-proprietary, which implies a role for some organization to develop and maintain the standard. The language (in at least some conceptions) might well need a large number of "external" concepts, terms such as "US Citizen" or "resident of a country recognizing the Berne convention" that are defined independent of any particular contract, implementation, or standard. Some custodial body would be needed to maintain these definitions.
Given the value placed on intellectual property, any technical scheme for protection, regardless of whether it uses a language of terms and conditions, is going to raise issues of liability. We may see use of "theft insurance" for electronic goods. Who will pay for such insurance? The information provider, system vendor, or the consumer? The protection infrastructure itself will require funding, and again it is unclear who will pay for it.
Finally, the social acceptance of payment schemes was discussed. How do people value information? Will people accept micropayments for content? At what point will the public refuse to pay for information? How much will a junior high student pay to read the full text of the New York Times article assigned for a civics class?
Workshop participants identified a number of issues that require further work. Here we name three; the full set can be found in the Digital Libraries Initiative Publications Workshop Series Report.1. Understand information use
Librarians have a notion of how information is used in a library, but they have little understanding of how materials are used in personal and workplace settings. Media people know what TV shows are watched, and the music people know what songs are played. If we build a truly interactive internet, these passive user behavior/payment models may not transfer. Anthropologists have studied how people use information in the workplace, how people gather information before making medical decisions, in the paper world. How (or will) the network change everyday information-seeking behavior (for example, personal newspapers)? Will traditional social groups, physical community places be impacted and how? Online information is transforming how science research is performed (physics, biochemistry and genetics). We do not know the potential impact on the Social Sciences or the Humanities. What will the acceptance rate be among people whose lives depend on less complicated information? Will people use the network to buy groceries? To buy cubic zirconia? Will the network replace the telephone?2. Education on law and technology
Most of our current, and nearly all of our potential digital library patrons are ignorant of copyright law, contract law, and the social issues we outlined above. We need to find ways to educate these populations so they can make informed decisions. This includes information for content creators and providers as well as for content users.3. Can a sufficiently rich database about users support mechanical access?
Clauses in contracts (e.g. "You can display this document for educational purposes") are not computable and hence not automatically enforceable. While one might find practical approximations for circumscribed domains (e.g., limiting access to a fixed set of users operating at certain places, etc.), this is unlikely to scale. A single individual might want to access a document under different terms and conditions, say sometimes as a law student, sometimes as an intern in a law firm. A useful direction might be to develop a more sophisticated "user metadata" infrastructure, in which user roles are assigned from a (multiple) hierarchy of user authorization classes. Purposes may then be derived from the interaction between the authorization classes and the desired use. Institutions acting as user metadata authorities could perform a number of useful functions: from assigning and guaranteeing user roles, to protecting user privacy, to delegation of subordinate authority.
To develop such a database raises issues of scaling, as the number of users (and the number of separate domains from which it is drawn) grows. Even within a single domain (such as a university library's user database) the constant updating of user information may be difficult, and these difficulties may be compounded by scale (as when, for example, attempts are made to represent the users of all university libraries in a region or country). To the extent that user databases representing different domains are managed differently for updating purposes, the scope of the problem grows still further. There are also considerable privacy issues to address.
The diversity of backgrounds brought each of us perspectives we did not have in our home communities. Further progress requires more interdisciplinary work. In the digital library community, librarians and computer scientists have begun to learn to work together, but now we know that we will have to enlarge that circle considerably.
The final workshop report, which includes reports from our four breakout groups, was published as part of the NSF-DARPA-NASA Digital Libraries Initiative Workshop Publication Series. It is available in hard copy from the Digital Libraries Initiative Publications Workshop Series at the University of Illinois at Urbana-Champaign, or from the Terms and Conditions home page. We will continue to update this page as a reference source for ongoing work on the technologies of terms and conditions.
We thank Steven M. Griffin and our advisory board (Carl Lagoze, David Millman, and William Y. Arms) for guidance in planning and executing the workshop, the participants for attending, Deborah Wolfe for transcriptions, and the Information Technology and Organizations Program in the Division of Information, Robotics and Intelligent Systems of the Directorate for Computer and Information Science and Engineering of the National Science Foundation for its sponsorship The opinions expressed herein are those of the authors alone, not the government.