My mail in the last month has been quite lively. Not all of it shows up on D-Lib Magazine's "Comments" page, which in itself is quite interesting. We set up the "Comments" page so that readers might communicate with me privately, and many of the messages I receive are not of general interest. For example, I am not convinced that you all really want to see comments about D-Lib's choice of fonts. On the other hand, the page has elicited responses about relevant projects that have led or are leading to research stories, to wit., Renato Iannella's message last winter about TURNIP, which was written in the wake of our February story on uniform resource names. Dr. Iannella subsequently wrote a briefing for us on his interoperability project.
But one private message this month does open up a possible Pandora's box. The writer points out that a link in Lorcan Dempsey and Stuart Weibel's story on the Warwick Metadata Workshop (July/August 1996) no longer works but that the document so referenced appears to be available at another location. Note the verb in the last clause: "appears to be", since my correspondent did not or could not verify that the two documents referenced are identical.
Three important issues arise: maintenance of the magazine and its archive of back issues; naming; and version control. Of these, I am presently most concerned about version control because it both protects the integrity of authors' contributions - including their citations and links - and conveys integrity to the magazine. It is also an issue for which the magazine can now legitimately accept responsibility on the server side.
D-Lib does not change or alter the substance of contributed stories. As any of our writers can attest, we send all changes, except spelling and minor punctuation, back to them for approval. This means that we do not retrospectively change HTML, for example, to add anchors to stories or to update addresses - with two exceptions: Within the first month, we correct errors in stories at the request of the authors; changes are noted and dated at the foot of the story. We also intend to add metadata to older stories, but this will embrace the existing stories, not alter the integrity of the contributed material. Like any publication, we are also prepared to put notices within the magazine when we discover or are made aware of inadvertent errors not already caught and corrected within the first month.
With each new issue, the old one goes into the archives. What happens in D-Lib's archives represents a juncture of what we recognize as "publishing" and curatorial practices. Consider my correspondent's problem: maintaining links. In order to update links, we must verify two things: that only the address has changed, and that the content of object referenced is stable. In itself, authenticating remote content is possible using a document comparison tool like Stanford's SCAM. But I am not convinced that D-Lib wants to get into the position of guaranteeing remote content.
Imagine, for example, that the address is stable, but that the content of the item referenced has changed. In this case, the original conceptual relationship is meaningless - and we wouldn't even know it if we reduced the question of hyperlinks to maintaining stable addresses. The optimistic scenario is that the link changes, catching our attention, we are able to verify the location and content of the original item (which itself may have seen subsequent versioning), and the author has agreed to let us update the links. Just like newspapers maintain morgues, we can envision a future service that overlays the item we define with updated information that includes pointers and perhaps threads to relevant material, but that preserves the integrity of the original so that the future reader can discriminate among versions of the item. At present, this service is beyond our resources, but its time is coming, abetted by new notions and functionalities of electronic or digital documents (as distinct from conventional, print documents) now explored by Berkeley, Xerox, and others.
Models of documents are one part of the solution; the other is naming. Solving the naming problem is a system-wide answer to D-Lib's narrow problem of maintaining hyperlinked citations. Managing addresses through globally unique, location-independent, persistent naming, so that the name or identifier is abstracted from its location and content, means that the burden of work shifts to various services that promise to maintain themselves. In this scenario, D-Lib need only register a location-independent, unique, persistent name (like CNRI's handle, OCLC's PURL, or any other scheme) and update any changes that we ourselves might make, perhaps using associated metadata to record alterations.
But this strategy still only applies to entities that D-Lib controls or manages, like stories and columns. It really doesn't help us with remote objects, unless their owners also register them within one of the naming systems. Presumably, one that is interoperable, and one that users employ. We are, after all, in the communications business. To end up with a beautiful world in which publishers only talked to each other seems to be one of the worst outcomes of opening Pandora's box. But a networked, interoperable world of authentic information might well be among the best.