Late last month, I went -- along with about 2,000 other people -- to the annual fall symposium of the American Medical Informatics Association here in Washington. Since the conference's theme was exploiting Internet/World Wide Web technologies, many of the technical discussions resonated ideas you might expect in any subject domain. But two characteristics of medical applications - the need to support clinicians and the intricacies of patient records - suggest ways in which applications are pushing technology and illustrate the contrast between managing information flows in bustling corporate settings and a model based on the published or "grey" literature, where most information is already "packaged".
Medicine offers a rich environment for deploying digital library technologies. The field relies heavily on visual observation, whether of tissue samples, MRI scans, or X-rays; generates complex databases of patient records, which are a potential research resource; and requires juggling multiple and evolving information sources from billing to genetic sequences to drug interactions. Finally, hospitals require complex information flows and represent settings that seem ripe for intranet applications.
Indeed, the appeal of browsers and web technologies in local settings was obvious. Nearly every presentation assumed that the systems for information capture and retrieval would be embedded in clinical settings. Not surprisingly, then, there is intense interest in natural language retrieval systems and their relationship to numerous controlled vocabularies and systems. The expectation of clinical use also means, says the program chair James J. Cimino of Columbia University, that physicians "want to extract relevant information when they need it." While the web has provided a much easier environment to do this, he looks to the research community to provide "tools that integrate across resources and across functionalities." He expects an informational rather than a directional response - this protein has these properties, not a list of 3 or 300 citations to the literature -- and his examples are based on access to medical reference texts which quickly provide authoritative answers to questions.
But the medical literature advances rapidly. The most current information is probably not in the authoritative textbooks, and information overload, says Stanford's Gio Wiederhold, is still a significant problem. What is needed is "rapid abstraction of significant results," recognizing that "significance" is context-dependent. "Many smarter tools" that, for example, can organize results hierarchically is the goal. Not surprisingly, there were numerous sessions devoted to expert systems, which is an area in which medical applications have been early, witness MYCIN in the 1970s.
Dr. Wiederhold is a major figure in digital libraries research and himself gave a paper on an architecture for data security, which illustrates the nuances of notions of security and confidentiality. Much of the discussion of security in electronic publishing, for example, revolves around access: based on stated terms and conditions of use, which can include strategies for payment and protection of intellectual property, authorized users are granted access to a collection. By contrast, patient records are a thicket of information to which many people can have legitimate access but for different purposes from billing to research. Whereas the transaction can be made secure and parties authenticated, more difficult, Wiederhold says, are problems that can arise when individuals with legitimate access for one purpose use it for another, or inadvertently violate the privacy of a patient. In this sense, the medical model of information is more similar to corporate or government information, where hierarchies of access to partitioned subsets of information are common.
In principle, some medical information, like social security numbers, can be de-coupled from the records. But excessive partitioning of the records by all possible symptoms, relationships, or diagnoses is neither practically feasible nor necessarily desirable, since a potential value of patient records is that they can support data mining, epidemiological research, or simulations of clinical trials. As a first step, Wiederhold proposes an architecture based on a human "Security Mediator", located in the firewall and equipped with a number of tools, enabling him or her to evaluate a request. Another approach, described by MIT student Latanya Sweeney, is "scrubbing" the records of identificational information, which reduces although does not necessarily eliminate the problem of breaches based on inference.
Sweeney and her colleagues are dealing with the problem of removing identifying information from physicians' notes, correspondence, and discharge records. Such loosely structured information inheres in the clinical setting and, as a practical matter, means that the writers rely heavily on context, employing jargon, nicknames, private shorthand, and acronyms. The MIT researchers have used a series of detection algorithms and replacement mechanisms which, they say, allow them to "reliably remove explicit personally identifying information." One of the interesting features of the Scrub research is that the system recognizes types of documents by characteristics of documents. This potentially goes toward managing unstructured text, which is common in medical records but also proliferates in offices, labs, archives, and libraries. Such information must be handled preferably on the fly and without extensive editing and offers different problems from the formal structure of books, monographs, and articles which lend themselves to SGML or related schema.
Andreas Paepcke has already observed in this magazine that "searching is not enough", and digital library technologies must handle a broad variety of information capture, storage, and retrieval problems in many settings. Hospitals are interesting places to start.
Note: spelling correction, November 19, 1996.