Image Description on the Internet

A Summary of the CNI/OCLC Image Metadata Workshop
September 24 - 25, 1996
Dublin, Ohio

Stuart Weibel and Eric Miller
Office of Research
OCLC Online Computer Library Center, Inc.
Dublin, Ohio

D-Lib Magazine, January 1997

ISSN 1082-9873

September 24-25, 1996. Seventy practitioners in the area of networked image description attended a two day workshop sponsored by the Coalition for Networked Information (CNI) and the OCLC Online Computer Library Center in Dublin, Ohio. This third in the series of metadata workshops addressed the application of the Dublin Core element set to image resource description (see the Dublin Core Homepage for more detailed information about this workshop and others in the series).

The two day workshop reached consensus, supporting the notion that the Dublin Core, within the context of the Warwick Framework, affords a foundation for the development of a simple resource description model to support network-based discovery of images. As Charles Rhyne, Chair of Art History at Reed College observed:

"I was not especially surprised that we concluded that the elements needed to discover text and images on the internet are similar. The text and the images themselves are radically different and require different types of expertise to study and interpret them, but most of the primary categories under which we classify and search for them are similar."

Given the original objective of the Dublin Core element set -- to define a simple, easily understood semantic core for network resource discovery -- satisfying core description requirements for both textual and visual information with a single element set is attractive indeed. The enthusiasm for settling on a single set was modulated with a strong recommendation to make the labels for existing elements more amenable to the dual purpose of text and image description.

Is an image a document-like-object?

The abstraction of a document-like-object emerged in the first workshop as a way of sidestepping differences in individual notions of what constitutes a discrete object worthy of separate description. One of the first issues addressed in the Image Metadata Workshop was whether an image is a document-like-object or is it so different that an alternative framework for description is required?

Consensus emerged around the idea that images are not so different from the document-like-objects of the first workshop. The expectation that a set of image-specific elements (an Image-Core) would emerge from the workshop gave way to the idea that the application of a slightly modified Dublin Core element set might serve as well. As Jennifer Trant, of the Arts and Humanities Data Service in the UK, wrote after the workshop:

"That images are 'document-like' was to me one of the more significant contributions of the meeting. We went into the discussion assuming that there would be an 'image core', expressed as a separate box within . . . the initial model for our discussions.

"We emerged from our two days of discussion with only one, slightly extended, set of core elements to support the discovery process, a set which seems to me to reflect the various conceptual categories researchers bring to their search for information. These categories did not change based on the media of the information resource (visual or otherwise) that might satisfy the query.

"After spending so long thinking that images were 'special' [to use museum-like assumptions] it was fascinating for me to have a group of image specialists say that in most content terms fixed/static/bounded images really are a lot like text-based document-like objects."

The single, slightly extended, set of core elements for image discovery emerged from two days of discussion as a set which seems to reflect the various conceptual categories researchers bring to their search for information. These categories were judged by the image specialists in attendance not to differ significantly based on the media (visual or otherwise) of the information resources that might satisfy the query.

The defining characteristic of a document-like-object is not its textual versus graphical content, but rather whether or not the resource is bounded, or fixed, in the sense that the resource looks the same to all users. Thus, images, movies, musical performances, speeches and other information objects which are characterized by being fixed (i.e., having identical content for each user) can also be thought of as document-like-objects.

Non-document-like objects, on the other hand, include such resources as virtual experiences, databases (including ones that generate document-like outputs), business graphics, CAD/CAM or geographic information generated from database values, and interactive applications which might have different content for each user. In the context of image discovery, these sources do not "contain" images as much as they "generate" images. The images they generate may be described as fixed document-like objects, but the metadata required to describe them (the systems doing the generating) are distinct.

Consider the example of the Visible Human Project (described in a workshop plenary talk by Earl Henderson of the National Institutes of Health). More than a collection of fixed images, the Visible Human Project at the National Library of Medicine is a collection of applications unified by a data set that is nothing if not visual in character. The scope of the project itself is dynamic and evolving rapidly, and the character of the visual outputs of any of the many applications growing up around this data set defy simple description and certainly are not bounded in the sense understood in this workshop. Such applications are systems, rather than collections of images.

A Model for Metadata

Much of the consensus-building surrounding the Dublin Core has involved accommodating pragmatic stakeholder concerns borne of long standing experience with legacy description models. It is helpful to have a conceptual model to guide this pragmatism, and just such a model developed in the course of the workshop. This model is an outgrowth of previous work of Bearman (see A Reference Model for Business Acceptable Communications) and provides conceptual support for both the Dublin Core and Warwick Framework by illustrating the transactional relationship of metadata and the research process.

The research process can be thought of as a series of interactive processes, which can provisionally be described as including:

Discovery: the identification of relevant resources
Retrieval: the transfer of resources to a local site
Collation: the aggregation and organization of selected resources
Analysis: the intellectual and/or computational analysis of resources
Re-presentation: the formulation of derivative intellectual artifacts based on the resources and previous processes in the sequence

These processes involve events and resources distributed among institutions, machines, networks, and the minds of individuals. Metadata, then, become any one set of elements drawn from the many kinds of information necessary for decision-making within this matrix of minds, machines, and networks.

For example, access to discovery metadata may lead to the return of terms and conditions elements, necessary for retrieval. Retrieval metadata might include the network address of a resolver from which the resource may be accessed or the publisher of an item with whom a usage agreement must be transacted. Collation metadata might include data about an image collection schema or the provenance of an item. Analysis might require a color map for the item. Re-presentation could involve information validating credit to rights holders, and might well require a link to update use history of the source object.

A variety of metadata will be needed to satisfy the requirements of each stage, and hence the functional requirements of metadata packages might well be defined by these requirements. To be used effectively, elements of metadata must be readily available as required by each stage in the research process in which the user is engaged (though different implementations might deliver some metadata at stages prior to its being needed). It is recognized that the pragmatics of collection and management of metadata will likely compromise this ideal, but the model can nonetheless inform our thinking and design.

One need not imagine all possible linkages to recognize the complexity of such a model, nor is it necessary to accommodate at the outset all possible elements, packages, and necessary infrastructure. But in the search for appropriate compromises, it is helpful to see the larger picture that this model attempts to capture. What is necessary, though, is an agreement as to the notion of assembling sets of descriptive elements, which enables extensibility and forward compatibility.

Like the Warwick Framework, this model explicitly recognizes that metadata will be created and managed by a variety of agents, for different reasons, at different times in the life of the object. This implies an infrastructure and architecture that does not now exist, but that will evolve, driven by the marketplace of information access. The modest achievement of this workshop is to reaffirm the semantic characteristics of but a single variety of metadata package -- the core elements of a resource discovery element set--and to assert its suitability for both textual and visual resources.

How are Images Different?

It is gratifying that the workshop reached agreement that text and images could be classified using similar categories, but just as clearly, images offer a number of technological and descriptive challenges peculiar to themselves.

Textual materials can be indexed, often simplifying or partially automating the task of description, whereas most of the descriptive elements of images are extrinsic to the work (or are not easily extracted from the work).

Encoding schemes are critical for using images. This can be true for textual materials as well, but there are fewer varieties of textual representation, and at least for some of them, there is some graceful failure (HTML or SGML, for example, are hard, but possible to read without a suitable rendering program).

Rendering of images is unforgiving and the variant forms are combinatorially overwhelming. Commonly encountered web graphics display by default and presumably tolerate wide differences in display characteristics. As more sophisticated imaging applications populate the Internet, metadata will play a more important role in discovery and selection. Information necessary for rendering may include:

type (bit-mapped, vector, video)

format (TIFF, GIF, JFIF, PICT, PCD, Photoshop, EPS, CGM, TGA . . .)

compression schemes and ratios (JPEG, LZW, QuickTime. . .)

dimensions

dynamic range

color lookup tables and related metrics (CMYK, RGB. . .)

Characteristics of original image capture, while less critical for the casual user, may be of overwhelming importance to the archival or research significance of the image or collection. This sort of information is also, for the most part, irrecoverable if not recorded at the time of capture.

Categories of information about the scanning process include: light source (full spectrum or infrared, for example) resolution, dynamic range, type of scanner, date of scan, journal/audit trails, and digital signatures for authentication.

Variant forms of the image content are also important (the Versioning Problem writ large): source image, different views of the same object, different scans of the same object, different resolutions of the same image, details of the same image, source ID, responsible institution. All these categories may be critical elements of metadata for a particular image or collection.

The complexity of adequately capturing and encoding such information conflicts with one of the original design goals of the Dublin Core: simplicity. If the Dublin Core is to be applied in some useful way to the domain of images, it will be necessary to isolate the essential core of information appropriate to a simple description record and to identify a graceful extension mechanism that supports encoding of the richer array of descriptors hinted at in the preceding paragraphs.

Modifications to the Dublin Core

The workshop consensus and subsequent animated discussions on the META2 list (the primary forum for discussion of Dublin Core issues) resulted in a number of changes to the Dublin Core element set (see Table 1). Several element names were modified slightly to make them less text-centric, and two elements were added to the original thirteen. The reference description of the elements resides at http://purl.org/metadata/dublin_core_elements.

Subject and Description Separated

SUBJECT and DESCRIPTION are now separate elements in the core, partly because of the judgment among the image specialists that these are quite distinct concepts for images. Other participants in the metadata discussions on META2 agreed that such a distinction is also useful for other media.

Thus, SUBJECT is intended to include keywords, controlled vocabulary terms, and formal classification designators, while DESCRIPTION is to be used for descriptive prose or content description (in the case of images) and affords a natural place for abstracts in the case of textual documents.

A Rights-Management Field

A simple rights-management field is perceived by many as a necessary component of a core description record. While arguably not an intrinsic dimension of discovery, it is of such importance to the use of images that failure to include such an element may hinder wide deployment. This is a good example of the imprecise lines of demarcation between different varieties of metadata that will inevitably blur the idealized functional boundaries one might hope for among metadata packages. Resource description is a messy business -- ask any cataloger.

The digital world requires a sophisticated language for expression and negotiation of intellectual property rights; the evolution of the supporting infrastructure is well underway (some of these have been reported in this journal). This element should not be construed as a substitute for such a language or metadata structure, but rather as a means for communicating simple terms and conditions where they exist or providing a link to more complex information as it evolves.

One proposed application of the field is as follows:

null - there may or may not be restrictions on use, and users have to figure it out independently, outside the context of this particular collection of metadata.
the string "No Restrictions on Reuse" - there are no restrictions on re-use.
URI or other pointer - there are restrictions on use, and users can follow the link to find out more information .

This approach addresses several implementation issues. The metadata could be used to retrieve materials with no restrictions on use at a top-level search, without getting into any subsidiary packages of metadata. Second-level packages of rights-management metadata could be retrieved automatically or presented to the user as links within the search results. All records in a single collection could share a single value in the rights management field. Additionally, managers that don't fill in the rights-management field, or that have rights issues but have no on-line access to that information, enjoy the presumption that a null response means there might be restrictions.

Open Issues

Surrogates and Objects

Among the most important impediments to coherent deployment of a metadata element set is the confusion between description of the object versus description of the digital surrogate of that object. This can be a problem with text, but in general, it is the intellectual content rather than its presentation that is of primary importance with text, and increasingly the primary version of a text is its electronic form.

With images, the variety of forms an image may assume in its life cycle is liable to be greater than for a piece of text, and the relationships among these versions are intrinsically more complex. The degree to which such information is captured, and the means of encoding it are difficult problems the solutions to which must evolve in tandem with the pragmatics of implementation.

Collection Versus Item-level Description

Collection descriptions and the schemas that account for the aggregation of images in such collections are essential for effective collection discovery. Early discussions embraced the possibility of a separate element for addressing this, though ultimately consensus emerged around the idea of capturing this sort of information in existing fields. The simplest possibility is to include a Resource Type or RELATION flag (COLLECTION | ITEM). Further explorations are necessary to determine whether this is sufficient or whether there might be other sensible values for this sub-element. This is part of the larger relation problem, which requires elaboration for visual and textual materials alike.

SOURCE is Dangerously Recursive

SOURCE information is potentially recursive (and probably complex) for any object, but especially with images. How can such object-surrogate-derivative relationships be expressed to both aid in discovery and to explicate intellectual property lineage?

Mapping of Dublin Core to Other Element Sets

One of the first tangible outcomes of the first metadata meeting was the mapping of Dublin Core elements to MARC fields by Rebecca Guenther of the Library of Congress. This discussion paper contributed substantively to the community awareness of the emergence of Dublin Core as a model for network resource description, and fed back into the change process for MARC. Similar mapping between Dublin Core elements and existing image description standards will clarify the role of the various elements and provide guidance for the application of the Dublin Core to image collections. As has been suggested on the workshop discussion list, existing standards or practices can serve as templates for the development of guidelines, thereby jump-starting interoperability and reducing the effort necessary to develop description standards.

Viewing Requirements.

The bandwidth and time penalty for retrieving images is often high, making it desirable to have some indication of usability prior to retrieval. It was agreed that an existing element (the FORMAT element, previously called FORM) could probably be used to express this information, but a standard of best practice needs to evolve through real-world implementation.

The most problematic aspect of this issue is where to stop. An archival site using the Dublin Core to describe items at a deep level might want to include a large set of image-related descriptors but this would hardly be expected to be the norm for broad deployment. A flexible means for including such information in the FORMAT element should be proposed in user guidelines, with an eye towards the evolution of a Warwick Framework-style package to support such needs in the future.

Future Developments

The Dublin Core is a high-level reference model. In and of itself, it does not provide guidance for cataloging or searching, nor is it a blueprint for system development. Rather, it provides guidance for the semantic content of a simple resource description model that may profitably be applied to visual as well as textual resources. The consensus developed around this model is the major product of the workshops. This consensus is the result of many people of standing who have used their own experiences and the collective intelligence of the communities they represent to arrive at a common foundation for networked resource description.

These workshops are but tentative steps in bringing this collective intelligence to bear on the difficult problem of resource discovery on the Internet. The achieved consensus is important but incomplete. It requires integration of detail. It requires elaboration and extension. It requires building a small community into a larger one. Most importantly, it requires sharing this vision with the system designers, authors, and information managers that must, through application and use, turn the model into applications that will help real users solve real problems.

The path forward will be charted through the collective means of the workshop mailing lists and subsequent workshops that will refine and elaborate this work-in-progress (see http://www.dstc.edu.au/DC4/ for information on the upcoming fourth Dublin Core Workshop). Slightly less than two years after the original workshop, prototype applications of Dublin Core are emerging, and recognition of the Dublin Core as a foundation for discovery-oriented resource description is growing. Building upon these prototypes and refining this consensus will provide the foundation for a network-wide body of practice that can help rationalize resource description across domains and make Internet resources more accessible.

Table 1: Abbreviated Description of Dublin Core Elements

(the reference description of the elements resides at http://purl.org/metadata/dublin_core_elements.

Element Descriptions

TITLE
The name given to the resource by the CREATOR or PUBLISHER.
AUTHOR OR CREATOR
The person(s) or organization(s) primarily responsible for the intellectual content of the resource.
SUBJECT AND KEYWORDS
The topic of the resource, or keywords, phrases, or classification descriptors that describe the subject or content of the resource.
DESCRIPTION
A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources.
PUBLISHER
The entity responsible for making the resource available in its present form, such as a publisher, a university department, or a corporate entity.
OTHER CONTRIBUTORS
Person(s) or organization(s) in addition to those specified in the CREATOR element who have made significant intellectual contributions to the resource but whose contribution is secondary to the individuals or entities specified in the CREATOR element.
DATE
The date the resource was made available in its present form.
RESOURCE TYPE
The category of the resource, such as home page, novel, poem, working paper, technical report, essay, dictionary. It is expected that RESOURCE TYPE will be chosen from an enumerated list of types.
FORMAT
The data representation of the resource, such as text/html, ASCII, Postscript file, executable application, or JPEG image. FORMAT will be assigned from enumerated lists such as registered Internet Media Types (MIME types).
RESOURCE IDENTIFIER
String or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs (when implemented).
SOURCE
The work, either print or electronic, from which this resource is derived, if applicable.
LANGUAGE
Language(s) of the intellectual content of the resource.
RELATION
Relationship to other resources. Formal specification of RELATION is currently under development.
COVERAGE
The spatial locations and temporal durations characteristic of the resource. Formal specification of COVERAGE is currently under development.
RIGHTS MANAGEMENT
The content of this element is intended to be a link (a URL or other suitable URI as appropriate) to a copyright notice, a rights-management statement, or perhaps a server that would provide such information in a dynamic way.

Return to text

Image Description on the Internet