HTC Performance Test: a Three Bin Sort

HTC classifies documents into Positive, Uncertain and Negative Bins, which are the rows of the contingency table. In this performance test, collection jx2099 (76 relevant and 24 irrelevant documents) was classified with selected Bin cutoffs of 90% True Positive and 90% True Negative. 75 documents were sorted to the Positive Bin, with 96% True Positives, and 24 to the Negative Bin with 87% True Negatives.

Double clicking on a cell in the table opens a list of the documents assigned to that cell, and double clicking a document name opens the document for review in the lower window. A document from the False Positive cell (in white) is shown, with feature highlights of yellow, green and red for features which appear only in relevant, only in irrelevant, or in both relevant and irrelevant documents, respectively. The user can than consider manual query modification to reweigh desired features.


Ad-Hoc Classification of Electronic Clinical Documents

Aronow & Feng

D-Lib Magazine, January 1997