    What Is Information Discovery About?
    H. A. Proper and P. D. Bruza

    Proper and Bruza's believe that information discovery is an attempt to broadly model information retrieval outside the context of any operational retrieval model. They define user need in terms of supply and demand for something undefined called infions, or information particles. Relevance is meeting a set of requirements stated in the same terms. Aboutness is apparently a relation on a set of key words, but satisfaction is a relationship on information carriers and their logical descriptions. A distinction is made between system and user satisfaction, which will be close in factual descriptions but not necessarily so in cases of aboutness.

    Text Segmentation for Chinese Spell Checking
    Kin Hong Lee, Qin Lu, and Mau Kit Michael Ng

    Since Chinese text has no natural delimiters text must be segmented into valid words before error correction can take place. Many words are represented by single characters but others require multiple character strings. Lee, Lu, and Ng. test the Block of Combinations (BOC) segmentation method which uses a 60,000-word dictionary with the 2,000 most frequently used words grammatically tagged. A user dictionary for adding words not predefined is available, and otherwise unidentified words are stored in a temporary file. The 200 most frequent single character words are accepted but others are suspected to be errors and presented to the user for clarification with similar words as suggestions. Since the number of possible segmentations increases rapidly as the number of characters grows, a sliding five-word window is used rather than a complete sentence. The procedure is more accurate than another method which takes about the same computational time.

    A Fuzzy Genetic Algorithm Approach to an Adaptive Information Retrieval Agent
    Maria J. Martin-Bautista, Maria-Amparo Vila, and Henrik Legind Larsen

    The Genetic Information Retrieval Agent Filter (GIRAF) is a software agent, tested here by Martin-Bautista, Vila, and Larsen, that can work offline to filter and rank retrieved documents from an Internet search engine. Query terms are extracted from evaluated initial search documents or from ideal documents provided by a user, and each of these terms is given a weight that is the average of its occurrence frequency in all documents analyzed. One of four gene types is assigned randomly to each term to form a triple with its weight, and chromosomes (strings of these gene triples) are randomly formed. Type one genes use as their weight a number of occurrences of the term in a document that will give complete satisfaction (a higher number reducing satisfaction). Type two genes are satisfied completely by documents that have no occurrences of the term. Type three genes use the weight as a traditional threshold with complete satisfaction achieved if the number is reached or exceeded. Type four divides each document into three parts and will be satisfied by any of the parts. Chromosomes are ranked by their similarity to relevant documents and modified by choosing parents where the first is random and the second higher than the first and preforming a gene crossover. Mutation also randomly occurs. New chromosomes cause those at the bottom of the list to be removed. Tests with virtual users indicate that type three genes work best, except that when user profiles are permitted to change during the process, type one gains some advantage

    A Distance and Angle Similarity Measure Method
    Jin Zhang and Robert R. Korfhage

    The typical angularity measure (cosine) identifies documents whose index term distributions are similar. Despite this similarity, they may be far apart in the document space if the level of detail in the discussion of the topics is different. Document similarity with a distance measure depends upon the length of the hypersphere radius from the reference point to the document. Zhang and Korfhage present a similarity measure which combines distance and angularity based measures. This measure ``s'' is the product of two parameters, a and c, where ``a'' has the negative radius as an exponent (a typical distance measure) and ``c'' (between 0 and 1) has the value of the angle divided by the maximum value of the angle as an exponent. The maximum value of ``s'' is equal to the distance-based measure and the minimum is smaller but in the same position as the cosine measure. Varying ``a'' and ``c'' will reflect a user's emphasis on distance or angularity.

    DARE: Distance and Angle Retrieval Environment: A Tale of the Two Measures
    Jin Zhang and Robert R. Korfhage

    In a second paper, Zhang and Korfhage present a visualization model which can display both distance and angle measures simultaneously and handle both conjunction and intersection. Changing the slope and position of a straight line in the visualization space results in a modification of the threshold-defined contour in the document vector space and thus expands or contracts the scope and emphasis of the retrieved set.

  • Perspectives Issue on...Visual Information Retrieval Interfaces
  • Introduction and Overview: Visualization, Retrieval, and Knowledge
    Mark Rorvig and Lois F. Lunin

    This Perspectives issue is assembled to provide an historical background to visualization in information retrieval. It is a review of the assumptions and technology configurations by which the current literature may be interpreted. The techniques of the authors of this issue differ, but all treat their techniques as manuals of description flowing from a history of common mathematical and technical influences. All technologies have histories of development. The historical forces of visualization frame the current efforts and comprise the field in which new problem dimensions are addressed. No field of scientific inquiry emerges without a background. This issue adds to the depth necessary for the study of visualization by new students and new scholars.

    The NASA Image Collection Visual Thesaurus
    M. E. Rorvig, C. H. Turner, and J. Moncada

    The first visual interface to a collection was designed and implemented at the Johnson Space Center of NASA in the years 1988-1992. In this interface, described in the article entitled, "The NASA Visual Thesaurus," the Rorvig et al assumed that the task of inferring images from terms and terms from images would introduce invariance in image indexing. The system remained in use for two years, but eventually failed because no automatic method to assign terms to images could be discovered, and the manual cost of such term assignment was too great to be supported.

    Rorvig et al attempted to use image descriptions clustered by cosine vector methods to identify a unique image for every thesaurus term. The candidate images suggested by this method were often heartbreakingly close to the mark. But close was not good enough. These developments were described in detail by Seloff (1990). Although the Seloff article has been widely cited, the initial article which specified the design parameters for the system of his report has remained unpublished. It appears in this article in the form originally presented at the ASIS mid-year conference of 1988. The article is significant because it represents the first identification of the components of a visual interface.

    Visualizing Science by Citation Mapping
    Henry Small

    The article by Henry Small of the Institute for Scientific Information addresses the two decade long historical use of visualization techniques in calculating the relationships among scientific fields by their patterns of co-citation. Small begins with the simplest of algorithms as conceived within the computational limitations of the 1970's and ends with the most ambitious ones presently available through Sandia National Laboratories (SNL). In this article, students and scholars will find algorithms applicable to many different aspects of the co-citation problem, as Small frankly describes the research paths that were successful and led to further enhancements as well as the ones which were eventually discarded either because of their inefficiency in computation, or their failure to yield truthful insights validated by earlier techniques. Many of these algorithms may be transplanted to address similar problems with data that may be encountered by researchers who require some intermediate processing alternatives.

    The Ecological Approach to Text Visualization
    James A. Wise

    The article by Jim Wise of Integral Visuals Corporation details the technical advances of researchers at the Pacific Northwest National Laboratories (PNNL) over an intense five year history of development. Wise's "The Ecological Approach to Text Visualization" offers a rich archive of techniques. This descriptive tour de force begins with the most brutal techniques (e.g., vectors of length 200K analyzed through Multidimensional Scaling (MDS)) and, in completely clear and intelligible detail describes the short cut methods developed for computational efficiency. These efforts have resulted in the presently available commercial product "ThemeMedia" offered through a subsidiary of the Smaby Group of investors (which purchased rights to further develop the technology). Among the highlights of this article is a description of the discovery that single and multiple link cluster centroids can be used to approximate the full text collections originally required for visual display. Additionally, the transformations of the 2d dot displays to the present terrain models simultaneously developed at PNNL and SNL are described in sufficient detail for engineers to reproduce the same progression of results.

    Interactive Graphical Queries for Bibliographic Search
    Martin Brooks and Jennifer Campbell

    Brooks and Campbell describe the translation of interactive boolean interfaces with data to a visual display. The "islands" interface which they illustrate harnesses the power of visualization to the process of commercial text retrieval. It is a fact that students are still mystified by these processes. One need only while away a few minutes on any college campus to realize that most persons still have not got a clue about the meaning of term conjunction and its impact on search results. Anyone who has ever performed a boolean search will be able to examine the effect of visualization on this process, and this article is offered to permit a broad view of practice changes which can be expected in future systems. "islands" may not be the ideal interface, but something like it, to paraphrase an advertising slogan, will be "...coming soon to a library near you."

    A Collection of Visual Thesauri for Browsing Large Collections of Geographic Images
    Marshall C. Ramsey, Hsinchun Chen, Bin Zhu, and Bruce R. Schatz

    In Ramsay et al, earth observing images are parsed as texts are parsed. These authors use Gabor filters to combine like terrains. No clearer description of this process is available in the present literature. A Gabor filter yields textures. By segmenting images into component texture boundaries, search classes may be derived without resorting to textual description. This technology thus succeeds where the NASA effort by Rorvig et al failed. The results reported in this article are concrete and verifiable; indeed, anyone who has ever traveled over Arizona highways can authenticate these data. The authors acknowledge the contributions of the Alexandria Digital Libraries Project, particularly the work of Manjunath and Ma (1996), but claim their own extensions to this work as well.

    Conference Notes--1996: Foundations of Advanced Information Visualization for Visual Information (Retrieval) Systems
    Mark Rorvig and Matthias Hemmje

    One of the landmark developments in visual retrieval occurred at a workshop held in Zurich in the summer of 1996 in conjunction with the Association for Computing Machinery's Special Interest Group on Information Retrieval Annual Meeting. For the first time, both European and North American interests were represented in the development of criteria for evaluation of visual information retrieval. Among the Europeans, the newly formed FADIVA (Foundations of Advanced Information Visualization) group played the dominant role. The workshop report reproduced in this issue has been widely circulated, but never before published. This conference led to the first visualization of native TREC/Tipster data as a prelude to formal visual information retrieval evaluation strategies (Rorvig and Fitzpatrick, 1998; Rorvig, 1998).

    [Robert Korfhage, one of the intellectual fathers of the visual information retrieval effort in both Europe and North America, has contributed his bibliography on this issue. The bibliography is comprehensive for all work in this field c. 1997. Such documents are of interest in determining the scope of future advances. In 1997, this was the known world view of this area of scholarly effort. For practical use in permitting users to copy this bibliography, it is available through the ASIS SIGVIZ website where future editions may be conveniently updated.]

  • Book Reviews
  • Foundations of Library and Information Science, by Richard E. Rubin
    Boyd P. Holmes

    Into the Future: The Foundation of Library and Information Services in the Post-Industrial Era, by Michael Harris, Stan A. Hannah, and Pamela C. Harris
    Ebrahim Afshar

    Newspapers of Record in a Digital Age: From Hot Type to Hot Link, by Shannon E. Martin and Kathleen A. Hansen
    Amy E. Sanidas

