VOLUME 52, NUMBER 6
[Note: below the contents of Bert Boyce's "In This Issue" has been cut into the
Table of Contents.]
CONTENTS
Editorial
In this issue
Bert R. Boyce
Page 443
Research
 Assessment of the Effects of User Characteristics on Mental Models of
Information
Retrieval Systems
Xiangmin Zhang and Mark Chignell
Page 445, Published online 15 February 2001
In this issue we begin with Zhang and Chignell who use the Repertory
Grid
Technique (RGT) to extract user's mental models of information retrieval
systems
in order to study the effects on these models of four characteristics:
educational
and professional status, first language, academic discipline, and computer
experience.
Each of 64 subjects rated nine retrieval system concepts as to three
attributes
(form/process, targeted/not targeted, and specific to IR system/applicable
to
all information systems) yielding 27 variables for analysis. A factor
analysis
yielded nine factors with an eigenvalue greater than one, which accounted
for
68% of the variation from the original ratings. The first factor appeared
to
be concerned with the purposefulness of querying; the second, applicability
of
data organization; the third, the function of querying; the forth,
applicability
of querying; the fifth, applicability of browsing; the sixth, function of
data
structure; the seventh, purposefulness of browsing; the eighth, function of
the
document; and the ninth factor, the purposefulness of data structure.
Analysis
of variance and Tukey tests were applied to the subjects factor scores.
Educational
and professional background, discipline, and computer experience all had
significant
effects on the factor scores representing the mental models, language did
not.
Student an information professional scores differed widely on factors 1 and
3.
Graduates differ from other students on factors 2 and 6. The user's
discipline
shows significant differences on factors 1, 2, 3, and 7, and computer
experience
has differences on 1, 2, and 7. Overall information professionals and
students
have strikingly different models. Science students see browsing as a
targeted
activity but humanities students do not. Language does not seem to affect
mental
models of information retrieval systems.
 Modeling the Retrieval Process for an Information Retrieval System
Using
an Ordinal Fuzzy Linguistic Approach
E. HerreraViedma
Page 460, Published online 15 February 2001
HerreraViedma, believes that quantitative weights computed from term
occurrence
are appropriate for the characterization of documents, but not for queries
or
the estimated relevance levels for ranking of retrieved documents, where
human
understanding argues for qualitative expression. Terms for queries are
ranked
in seven symmetric ordinal classes by searchers, or by an importance weight
or
by a weight indicating how many documents should be returned for that term.
An
RSV is computed for each document for each ordered representation of the
query.
These are then aggregated by the search system for final evaluation of
documents.
The aggregation is carried out by linguistic implication functions which
provide
varied definitions of disjunction and conjunction depending upon the
relative
importance of the logical subexpressions of the query. Users will need to
determine
which, or how many of the ordering schemes to use.
 Discovering Term Occurrence Structure in Text
Abraham Bookstein and T. Raita
Page 476, Published online 15 February 2001
Bookstein and Raita observe that term occurrences tend to clump in
texts.
That is to say, if a term's occurrence is observed in adjacent text
segments,
the expected number of random clumps will be exceeded. Strongly clumped
terms
have retrieval value, and if text is partitioned to minimize clumping
strength
such stretches of text are likely to be content homogeneous. Linear
clumping
strength is measured by the ratio of the expected value of clumps formed to
the
observed value. The standard deviation will express the degree of
nonrandomness
or clumping. Condensation clumping views the problem as a distribution of
terms
(balls) into text segments (urns) and the ratio of the expected number of
segments
containing the term to the observed number as the clumping measure. The
common
retrieval measure, inverse document frequency, can be rewritten in these
terms
with little difference between the two when the probability the segment
contains
the term is small. The standard deviation of the condensation clumping
measure
will allow an expression of the degree of nonrandomness, but is complex to
compute.
The use of an approximate value at least as large as the standard deviation
simplifies
the process. The two measures diverge as segments are merged together with
linear
clumping decreasing and condensation clumping increasing.
Using the same general model a measure is constructed using the gaps
between
segments with term occurrence, where the text is considered to be wrapped
in
a circular fashion. More generality is achieved, but it appears that
performance
is very similar to the previous measures.
 Optimal Query Expansion (QE) Processing Methods with Semantically
Encoded
Structured Thesauri Terminology
Jane Greenberg
Page 487, Published online 22 February 2001
Greenberg looks at the automatic expansion of queries using thesaurus
terms
in varying relationships with entry terms, based on a binary relevance
evaluation
of initial return by end users, as opposed to interactive expansion where
the
system provides a list of possibilities based on the initial return and the
user
chooses expansion terms. Using ten queries collected from MBA students, the
ProQuest
Controlled Vocabulary, and the ABI/Inform database on DIALOG, she mapped
each
query to the thesaurus terms as a base, and created four expansions:
synonyms,
narrower terms, related terms, and broader terms. Relevance judgements were
made
on the basis of topical matching (aboutness) by the contributors of the
queries
reviewing the Union set of the responses to the query forms where each
retrieved
list was limited to a length 15 or less citations. The automatic expansions
separately
took all synonyms, all narrower terms, all broader terms, and all related
terms.
For interactive expansion users chose from a alphabetized union list of the
terms
in thesaurus records for query terms. These selections were then
incorporated
in the query expansion by the searcher. Users chose from all groups but
took
over half of the suggested synonyms and broader terms, and over a quarter
of
the narrower and related terms. Synonyms and narrower terms augmented
recall
without a significant loss in precision in both automated and interactive
searching,
which argues for their use in automated expansion since less effort is
required.
Broader and related terms improved recall the most but would not be useful
in
automatic expansion if high precision is a goal. However, they, and
particularly
related terms, are seen as excellent candidates for use in interactive
expansion.
 Evaluating Internet Resources: Identity, Affiliation, and Cognitive
Authority
in a Networked World
John W. Fritch and Robert L. Cromwell
Page 499, Published online 8 March 2001
The filters in print media that provide authority are not available on
the
Internet so that authorship and thus accountability are uncertain.
Determining
true authorship and affiliation are likely to be the most significant need
in
establishing cognitive authority of a site. Fritch and Cromwell suggest the
assessment
of documents, authors, institutions and affiliations separately followed by
integration
of the results while indicating confidence in decisions on a separate
scale.
In their example, confirming the connection of the domain name to the
assumed
sponsor via the Whois search is a first step. Looking for author statements
and
affiliations to other sites is the second. The identification of overt and
covert
links may disclose bias.
