D-Lib Magazine
The Magazine of the Digital Library Forum
September 1995

To the Editor

The What versus the How


"Retrieval as a process"

I was delighted by Amy Friedlander's editorial, "When Is Honesty the Best Policy?" because it focuses on issues of integrity of service that can tend to recede from attention out of pure preoccupation with technology.

I'd go even further than she in suggesting that we have to be careful of the idea that whatever access the system may yield is enough. It's a standard experience for those of us who labor over the keyboard to tolerate anomalies in system function. We begin to take them for granted, part of what it means to depend on electronics instead of older modes of doing knowledge work. But no physical library worthy of the name would so cloud the range of available resources and the quality of the retrieval as to leave its users unsure of the thoroughness of the search--the precision and recall as we have come to say--they had been permitted to conduct, even having done all the "right" things as users.

Just as important to the utility of the digital libraries we are creating as effective naming systems, for example, is scrupulous attention to interfaces that induce users to formulate their queries accurately, on the one hand, and interfaces that represent accurately the functional consequence of searches on the other--what worked as planned and what didn't.

Having exerted ourselves to develop broad and portable access, we still have to be relentless in our evaluation of the quality of access we have created. If access gives me without notice a different cut every time I slice, that's not so good. If it gives me volume without indicators of value, that also is not good. If I can ask for what I need only approximately, and I know what I've got back only approximately, that's bad.

Until recently I was manager of an advanced development group in Digital Equipment Corporation working in conjunction with a number of universities on systems for semiautomatic retrieval and presentation of human-readable information, and before that (I say still with some surprise) I was a book editor and publisher. I've been associated with a number of research efforts to develop effective retrieval engines. Real progress has been made, and no doubt more will occur in time, but my inclination now is to believe that we ought to focus on the effective application of the best of such engines: especially on designing interfaces for them--probably a set of interfaces to accommodate differing user styles, subjects, search structures, and degrees of interaction. Features that try to capture semantic relationships, like PhraseFinder in the INQUERY system, could have great utility in query expansion. So could accommodation of relevance feedback as a kind of on-going query by example ("This is an instance of what I had in mind, find me more," or "This is not, drop this line of approach").

Similarly, much more attention needs to be paid to the expression of the results of the retrieval process. Just as in RL, there's rarely a single answer to a sophisticated retrieval question. My experience with the work of the late Muriel Cooper at the MIT Media Lab and the viewers conceived and developed at the Maya Design Group in Pittsburgh leads me to believe that lurking within reach are graphical ways of expressing retrieval relationships that could be of major value in helping users of digital libraries understand the richness of what is returned from what should be a rich process of seeking and finding (or to be aware of the lack of richness, if the system goes awry).

A closely related area of need and opportunity is the graphical representation of information itself. Again, those of us who began pressing the keys at a point when even the limited graphics of current versions of html with tables or in-line bit maps were available to us have learned perhaps too much tolerance of such limitations. Good layout and good design are not decorations. They are part and parcel of what communication is (as you would expect someone who had been a print publisher to say, I suppose).

Of course there are constraints to the representation of human-readable information by computers, but there are also vast potentials, going far beyond hyperlinking. For bite-sized information units, hyperlinking is certainly useful, but it is terribly limited as a cognitive technique for information structuring. For sustained knowledge work, casual hyperlinking is like having a conversation interrupted every minute or so by someone who wants to talk about something slightly different. We need to understand that the design of information presentation on the screen bears some resemblance to design for presentation on bound paper, but it is only approximate, and most human-readable information should in fact be redesigned for the screen.

Well done, that redesign can give users perspective on a range of information units, allowing them access to a much larger volume on the axis of what is current, or to a much larger volume on the axis of what is "historical," or both. It can allow them to interact with the information in far more sophisticated ways than linking permits. It can free them from the tiresome navigational conventions of the usual PC displays. It can allow multithreaded information to weave a highly personal fabric of specific meaning for individual users.

People cherish serendipity, but for most of us, at least, there is only a cause-and-effect world. There are no real accidents. But there are certainly happy surprises, and I propose that we can preserve and indeed enhance such surprises in an electronic information environment. I have done some work in the graphical display of relevance-scored "documents," and I suggest, as an example, that the mind quickly discerns what are fascinating patterns in the scattering of hits, particularly if they are accompanied by more information than is provided by simple binary, there or not there, results. Dimensional use of color, annotations (of the kind put forward by Roscheisen, Winograd, and Paepcke in one of your recent issues), machine-generated hyperlinks (of the kind suggested by David Evans of Claritech)--we are merely beginning to learn what I suspect are many modes of useful surprise. I have personally been interested in what I have called "information mentoring" by persons whose knowledge and inclinations of mind attract us to them as "alerters" of what is absorbing, new, amusing, and so forth.

Users, then, should have the choice of approaching query formulation and the interpretation of the results of retrieval as a process, not a kind of brief binary transaction--a highly individual process of progressive refinement until we get where we want to go or have had enough. Years ago, scholars like R. A. Fairthorne, and their disciples like Kenneth Warren and William Goffman, were beginning to think about what it means to rationalize the information system. They were arguing that it is a myth to assume that because that system, or any system, is allowed to be fully accidental, it will therefore be fully free and fully functional. I believe that we have sufficient base technology to shape the tools that will enable what I have called a richer process of seeking and finding, and a better appreciation of the products of that process, expected and unexpected. I think it very important that we do so.

Howard Webber
Chairman, FutureTense, Inc.
[email protected]

"Lenthy download" or server "breakdown"?

I have been meaning to write you with congratulations on the first issue of "d-lib magazine." I found it highly informative and useful. Now with the appearance of the second issue, which even exceeds the first in value, I could not, in good conscience hesitate longer. "d-lib magazine" will become an essential source for those seriously engaged in the creation of digital libraries.The quality and pertinence of your reports from the frontiers of d-lib technology are unparalleled and the linkages to other relevant sites are extremely valuable. The new services, like the Playpen, have their own special value as well.

All of which is not to say, of course that you have yet achieved perfection. For example, the first article in the August issue ("Content Ratings...") has much of interest, but the lengthy download required to fetch it caused me to wonder if my pc, the telecommunications link, or your server had suffered a breakdown -- and my connection was at 10 megabits! I am grateful I did not attempt to view the article from home where it would have crept in at 14.4 baud. Please provide within the text of the article links to the images, or in other ways break-up the downloading -- if this is not done, excellent articles, like "Content Ratings...", will go unread by many who would profit from them.

Mainly, however, let me repeat my congratulations and express my gratitude for work exceedingly well done.

Best regards,

Bob Zich
Director
Electronic Programs, Cultural Affairs
Library of Congress
Washington, DC

D-Lib was most concerned when we received this letter because we had believed that downloading Content Ratings would require approximately 90 seconds using modem connections. Indeed, we had checked this estimate with the authors before running the story. So, we re-checked the story and discovered that downloading via Netscape v. 1.0 using a 14.4 V.42 bis modem required approximately 100 seconds. Since we are an experimental magazine, we are inviting our readers to try this out and tell us the results. We would like to know:
How long it takes to download the story?
The type and speed of your connections?
Time of day?
And, whether you thought the story is worth the wait?

Please send messages to the editor at: [email protected]. In the next several months, we plan to establish a subscription list for notifying subscribers when the next issue is available. We will let you know when the subscription process is in place. The magazine will continue to be publicly accessible.

D-Lib Forum |  D-Lib Magazine Contents Page |  Next Story


hdl://cnri.dlib/september95-messages