Journal of the American Society for Information Science (JASIS) -- Table of Contents

Journal of the American Society for Information Science (JASIS) -- Table of Contents

Contributed by
Richard Hill
American Society for Information Science
Silver Spring, Maryland, USA
[email protected]

VOLUME 50, NUMBER 6 (May 1 1999)

CONTENTS

Editorial

In This Issue
Bert R. Boyce

JASIS Standards

Donald H. Kraft

Research

Condorcet Query Engine: A Query Engine for Coordinated Index Terms
Paul E. van der Vet and Nicolaas J. I. Mars

Van der Vet and Mars revive the attempts to incorporate predicate relationship between assigned index terms during the searching process, to control for the difference between, for example, ``aspirin as a cause of, and aspirin as a cure for, headache.'' Such relationships, and syntaxes for their use, are defined for each indexing language and are applied to terms from specified hierarchies of indexing concepts to create coordinated concepts. This is used with the possibility of specifying all narrower terms than the chosen concept with an ANY operator. Thus for a query one may choose a single or ANY predicate relationship, a single or ANY term as the first element of syntax, and a single or ANY term as the second. Such indexing concepts can then be combined with Boolean operators. The system will take considerable time or space resources for concept expansion. The problem of syntax assignment at the time of indexing is not addressed.

Derivative Bibliographic Relationships: The Work Relationship in a Global Bibliographic Database
Richard P. Smiraglia and Gregory H. Leazer

Using OCLC Online Computer Library Center's WorldCat, the proportions of works in families whose members consist of multiple editions, translations, amplifications, extractions, adaptions, accompanying material, and performances, were investigated by Smiraglia and Leazer. From a random sample of 1,000 records a final sample of 477 progenitor records was culled, and then WorldCat was searched for derivative records. Derivative works, and thus families greater than one, existed for one-third of the sample. Family size ranged from 2 to 45 with a mean of 3.54, or 1.77 if single-member families are included. Two-thirds of the observed derivations are controlled with collocating headings. Discipline, form, and genre do not affect derivation. Families seem to reach full size soon after publication of the progenitor, although older ancestors have large families.

Cyberbrowsing: Information Customization on the Web
Hal Berghel, Daniel Berleant, Thomas Foy, and Marcus McGuire

Customization for Berghel et al. is the personalization of information bearing items by extraction, interaction, and nonprescriptive nonlinear traversal on a client's machine. Cyberbrowser is a browser add-on which allows the analysis of retrieved items by preforming frequency count-based keyword selection, displaying the keywords with frequencies, and selecting sentences with chosen keywords present.

Hierarchical Concept Indexing of Full-Text Documents in the Unified Medical Language System[register mark] Information Sources Map
Lawrence W. Wright, Holly K. Grossetta Nardini, Alar R. Aronson, and Thomas C. Rindflesch

Using Health Services/Technology Assessment Text (HSTAT) as a database, Wright et al. extracted four HSTAT files with material on breast cancer and containing 66 distinct documents. By using the available SGML tags, chapter and section headings were located and used to divide the documents into parts while retaining its hierarchical structure. Using MetaMap, which translates medical text to UMLS Metathesaurus terms and ranks these by occurrence, specificity, and position, terms which are less accurate than human indexing but superior to purely extracted terms are chosen using the document fragments. Since both the whole document and its sections are represented, the resulting index is hierarchical in nature. Of the MetaMap-generated MeSH terms, 60% were not in the current indexing of HSTAT, and MMI produced results similar to that of the HSTAT search facility--except that MMI could bring in larger sections or whole documents, rather than fine sections alone.

Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System
Hani Abu-Salem, Mahmoud Al-Omari, and Martha W. Evens

A stem in Arabic is a root verb form combined with derivational morphemes but with affixes removed. Abu-Salem et al. choose to use a word, a stem, or a root for a query term based upon which form has the highest average inverse document frequency value, a method necessitating the creation of a three-field term record. Using 120 documents and 32 queries provided by users who also provided relevance judgements, this mixed stemming method was compared to the individual forms alone using binary weighting, and inverse document frequency weighting. The root with weighting method was the superior. The mixed stemming improved binary weighting search results in all cases but did not increase performance over weighted stems or roots.

An Experiment on Node Size in a Hypermedia System
Su Hee Kim and Caroline M. Eastman

For Kim and Eastman nodes are that material that can be viewed by scrolling in a hypermedia system without using a link. Node size can be viewed as required storage, window size for viewing, or logical size, i.e., the number of characters, words, lines, or other items presented to the viewer. To determine if window size and text length affect retrieval time, groups of 10 students searched 20 queries on each of the four possible versions of a file prepared in two card sizes and two text lengths. ANOVA does not support interaction between card size and text length, or between the two card sizes. A significant difference occurs between text lengths, where longer text provides quicker results.

Faculty Perceptions of Electronic Journals as Scholarly Communication: A Question of Prestige and Legitimacy
Cheri Speier, Jonathan Palmer, Daniel Wren, and Susan Hahn

Speier et al. surveyed a random sample of the business school faculty at 47 ARL universities for demographics, perceptions of promotion and tenure, and familiarity with electronic publishing. A 22% return rate yielded 300 usable surveys. Only 16% had read electronic journals and only 7% had submitted a paper to one. Youth and a high level of publishing are associated with awareness of electronic journals. Finance and MIS faculty are significantly more aware of electronic journals than management, marketing, or operations management. The view toward the value of e-journals appears to be negative, or at best neutral, when compared to paper journals.

Activity of Understanding a Problem during Interaction with an "Enabling" Information Retrieval System: Modeling Information Flow
Charles Cole

Cole derives a new model of communication by combining Shannon's model with the three world model of Popper. He stresses the two-way feedback operations that reoccur as conjectures and refutations continue toward a conclusion governed by the cognitive state of the user who serves repeatedly as both destination and source in Shannon's sense.

VOLUME 50, NUMBER 7 (May 15 1999)

CONTENTS

Editorial

In This Issue
Bert R. Boyce

Research

H.G. Wells's Idea of a World Brain: A Critical Reassessment
W. Boyd Rayward

To begin this issue, Rayward examines Wells's concept of the "World Brain," provoking questions on the nature of today's global information systems. Wells advocated a New Republic, a world government growing from a world organization of scientific work and communication, an encyclopedia providing a systematic ordering of human thought and acting as a sort of superuniversity. As a physical organization it would involve collecting, summarizing, updating, and publishing the flow of new knowledge using microfilm and the information technologies of the day.

Wells also believed in man's evolution toward a "conscious unification of the human species," by way of the superhuman apparatus of public knowledge toward social rather than individualistic goals. This involves eugenics, the breeding out of the less intelligent, the movement of scientists into the political process, and the deportation of criminal elements. The encyclopedia organization would speed this process by dominating worldwide education to create a common interpretation of reality. The Wellsian World Brain would function not only as a repository of scientific knowledge but as a database of the populace recording their characteristics and movements.

Literature-Based Discovery by Lexical Statistics
Robert K. Lindsay and Michael D. Gordon

Lindsay and Gordon explore a word count approach to Swanson's literature-based knowledge discovery strategy using complete MEDLINE records, two- and three-word phrases, and the identification of intermediary topics by high-occurrence frequency. Swanson's study linking migraine and dietary magnesium was duplicated. Ten of the 12 intermediate literatures previously found were identified.

Jumpstarting the Information Design for a Community Network
Misha W. Vaughan and Nancy Schwartz

Vaughan and Schwartz provide an example of a community service web site design based to a large degree on iterative user study information. Focus group sessions with paper prototypes and card sorts were used to solicit user opinion on how the site could differentiate itself from newspapers, libraries, and city government as a source, whether organization and labeling maximize meaningfulness to the user, and whether multiple categories might be reduced to a simplified hierarchy. Groups reduced the main categories to 10 to fit screen requirements and shared and discussed their results. All were asked to suggest services that might be provided different than those already available in the community. Discussion led to consensus on a structure, which was used to build a web site which was tested by eight participants each given 21 tasks to perform. A path followed by at least six was assumed to be appropriate. This resulted in the shifting, renaming, and cross-linking of several headings, and the removal of strongly community-oriented heading from an alphabetical display to the lead position.

Searching Scientific Information on the Internet: A Dutch Academic User Survey
Henk J. Voorbij

A random 1,000-person sample of the academic community of the Netherlands was surveyed by Voorbij's questionnaire. Of the 50% responding, 71% were Internet users. Students and faculty do not differ appreciably in levels of use. E-mail use is high, e-journal use is low. More traditional subject information sources rate above the Internet but it is heavily used to access factual and ephemeral material. Meta search and advanced search options are considered important but seldom used. Low precision, lack of quality sources, and response speed are seen as problems, but 68% believe results justify time invested. Lack of skill and of access are major reasons for non-use, but a significant number of non-users cited sufficient information elsewhere, and lack of knowledge as to what might be available in their disciplines. A focus group of 11 experienced faculty indicated a very positive attitude toward the World Wide Web, e-mail, and discussion groups. None were disposed to publishing on the Web.

SENTINEL: A Multiple Engine Information Retrieval and Visualization System
Kevin L. Fox, Ophir Frieder, Margaret M. Knepper, and Eric J. Snowberg

Fox et al. describe SENTINEL, a retrieval system using both an n-gram filter and a modification of the vector space model with vectors of documents judged relevant resubmitted in a feedback process, and documents ranked by combining their scores in both systems. Words with similar use are clustered together using a neural network training algorithm forming axes used for underlying positioning. The ranked list output with feedback capabilities is supplemented with a three-dimensional map of document and query positions based on the training set axes.

Brief Communication

Systematic Weighting and Ranking: Cutting the Gordian Knot
Davis and McKim describe the weighting and ranking algorithm of SWEAR[trade mark], which uses powers of two to assign weights to query terms entered. If N terms are entered, the first entered term is given a 2^N weight, the second a 2^(N-1) weight, and so forth until the last receives a 2^0 weight. Terms to be negated are given a negative weight equal to the sum of the positive weights, and possible terms a weight of 1 (2^0). A threshold is set at 2, and an accumulator for each document sums the weights of occurrences of query terms to generate a retrieval status value that provides a weak order of documents containing query terms. The searcher can change the weighting scheme, and thus the output ranking, by changing the order of entry.

Book Reviews

Ink into Bits: A Web of Converging Media, by Charles T. Meadow
Jeff White

Technology and Privacy: The New Landscape, edited by Philip E. Agre and Marc Rotenberg Marc Lampson

VOLUME 50, NUMBER 8 (June 1 1999)

CONTENTS

Editorial

In This Issue
Bert R. Boyce

Research

Images of Similarity: A Visual Exploration of Optimal Similarity Metrics and Scaling Properties of TREC Topic-Document Sets
Mark Rorvig

In our first two papers, Rorvig takes a visual look at the TREC data. In "Images of Similarity," five different similarity measures used on five TREC document sets are scaled and plotted using multidimensional scaling with ordinal, interval, and maximum likelihood assumptions. Cosine, and surprisingly overlap, provide the desired bull's-eye pattern under maximum likelihood assumptions and tighten as assumptions move from ordinal to maximum likelihood. Ordinal assumptions with MDS are not adequate for a visual information retrieval interface. A regularity in the pattern of relevant documents would seem to indicate a consistency in human relevance assignments not indicated in previous work.

A Visual Exploration of the Orderliness of TREC Relevance Judgments
Mark Rorvig

In the second paper, multidimensional scaling of topical sets from the TREC database indicates that Shaw's criticism of clustering techniques does not extend to similarity data transformed to spatial proximities since the isomorphic relations between topic distances do appear. Only two of 200 randomly introduced documents are found in the center of the dense area of relevant documents, suggesting that while the TREC evaluation methods do exclude relevant documents, the problem may not be as severe as Harter has proposed. The semantic relevance of others from the 200 found close to the dense area is unclear and will require investigation.

Automatic Indexing of Documents from Journal Descriptors: A Preliminary Investigation
Susanne M. Humphrey

Humphery outlines a technique for associating the journal descriptors (JDs) in NLM's serials authority file SERLINE with words commonly occurring in the titles and abstracts of papers found in journals that have been assigned these descriptors. A Medline training set will produce a table of text words associated with particular journals, and the descriptors assigned. The test process involved text indexing of titles and abstracts of 3,995 training set documents covering 1,466 journals to extract terms occurring 13 or more times in the set. A measure can be based upon the number of occurrences of a term in association with particular journal descriptor divided by its total occurrences, or on the number of papers containing the term for each descriptor divided by the total number of papers containing the word in the training set. This produces a ranked list of descriptors for each word extracted from a paper. The average rankings over all a document's terms are then used for the document's JD ranking. Tests of papers outside the training set return the JDs of these papers' journals and other JDs as well. The inverse of the citation count for a JD is shown to be a likely normalization factor for JDs with high citation counts.

Bibliometric Overview of Library and Information Science Research in Spain
V. Cano

The 345 papers that constitute the 17-year output of two leading Spanish library and information science journals were analyzed by Cano to collect author affiliation, number of authors per paper, country of the first author, number of authors publishing in both journals, and number of authors publishing in other journals indexed by LISA. Sixty-eight percent of the papers have single authors, and only seven authors published in both journals. Empirical and descriptive methods dominate.

User Reactions as Access Mechanism: An Exploration Based on Captions for Images
Brian C. O'Connor, Mary K. O'Connor, and June M. Abbas

O'Connor et al. believe verbal user reactions to images may be collected and used to represent the knowledge state of the reactor to the image, with the assumption that future users may wish an image that evokes a similar state. A world wide web site of 300 images was exposed to 120 respondents asked to provide responses of words to describe the image, words to describe the feelings evoked by the image, and to write a caption. The 82 images receiving 10 or more responses were then characterized by counts of total responses, adjectives describing the image as a whole, narrative responses, captions with narrative responses as well as the percentage of responses with adjectival descriptors, the percentage of narrative responses, and the percentage of captions with narrative responses. There is a tendency for different respondents to assign diametrically opposed adjectives.

Medical Students' Confidence Judgments Using a Factual Database and Personal Memory: A Comparison
Karen M. O'Keefe, Barbara M. Wildemuth, and Charles P. Friedman

Measuring need fulfillment by their subject's confidence in the accuracy of their answers, O'Keefe et al. examine medical students' ability to recognize the meeting of an information need from memory and from using a factual database. Twelve of 43 students, randomly selected and tested three times, completed a sufficient amount of questions with confidence rankings to be analyzed. Two passes were recorded each time: first, short answers from memory; then with the aid of a database search. An ANOVA shows no significant effect on Brier scores, the sum of the square of the differences between the confidence rating and the score for each question divided by the number of questions, on the basis of memory versus database support. Confidence, the difference between the average of the confidence probabilities for a set of questions and the proportion answered correctly, increased with experience over the three repetitions.

Employing Multiple Representations for Chinese Information Retrieval
K. L. Kwok

Kwok finds that difficult Chinese word segmentation can be avoided if bigrams (instances of two consecutive characters) are extracted and used, despite the fact that this method leads to an index space three times as large as word extraction. Bigrams extract the two-thirds of Chinese words which are two characters in length, but while meaningless combinations of very high and low occurrence may be removed, many meaningless bigrams will remain. Single character words, which make up about 9% of the language, would also not be represented.

Using a dictionary of 2,175 common one-, two-, and three-character words, strings are processed left to right, with useful terms retained when found. The remaining strings are segmented using rules. A probabilistic feedback model is then used to generate RSVs indicating matches between queries and documents, or document segments. Using the TREC 5 and 6 collections, Kwok finds that mixing single character with bigram or with short-word indexing improves average precision in four of five cases. Short-word and character is most efficient and gives the best results. Combining the results of short-word and character with bigram and character yields an additional 5% improvement at substantial overhead cost.

Book Reviews

Deep Information: The Role of Information Policy in Environmental Sustainability, by John Felleman
Mike Steckel

Electronic Databases and Publishing, edited by Albert Henderson
Marianne Afifi

Localist Connectionist Approaches to Human Cognition, edited by Jonathan Grainger and Arthur M. Jacobs
Chaomei Chen

Ethics, Information and Technology: Readings, edited by Richard N. Stichler and Robert Hauptman
Thomas A. Peters

Indexing and Abstracting in Theory and Practice, by F. W. Lancaster
Jens-Erik Mai

Remediation: Understanding New Media, by Jay David Bolter and Richard Grusin
Ronald Day

Call for Papers

(On 4/30/03, a formatting correction was made to the numeric entries in the Brief Communication by Davis and McKim.)

Click here to return to the D-Lib Magazine clips column.