Modern Information Retrieval Chapter 10: User Interfaces and Visualization |
![]() Contents |
collection overviews!category overviews collection overviews!directories
There exist today many large online text collections to which category
labels have been assigned. Traditional online bibliographic systems
have for decades assigned subject headings to books and other
documents [#!sven!#]. MEDLINE , a large collection of
biomedical articles, has associated with it Medical Subject Headings
(MeSH) consisting of approximately 18,000 categories [#!lowe94!#].
The Association for Computing Machinery (ACM) has developed a
hierarchy of approximately 1200 category (keyword)
labels.
Yahoo! [#!yahoo!#], one of the most popular search sites on
the World Wide Web, organizes Web pages into a hierarchy consisting of
thousands of category labels.
The popularity of Yahoo! and other Web directories suggests that hierarchically structured categories are useful starting points for users seeking information on the Web. This popularity may reflect a preference to begin at a logical starting point, such as the home page for a set of information, or it may reflect a desire to avoid having to guess which words will retrieve the desired information. (It may also reflect the fact that directory services attempt to cull out low quality Web sites.)
The meanings of category labels differ somewhat among collections. Most are designed to help organize the documents and to aid in query specification. Unfortunately, users of online bibliographic catalogs rarely use the available subject headings [#!hancock-beaulieu92b!#,#!drabenstott96!#]. Hancock-Beaulieu and Drabenstott and Weller, among others, put much of the blame on poor (command line-based) user interfaces which provide little aid for selecting subject labels and require users to scroll through long alphabetic lists. Even with graphical Web interfaces, finding the appropriate place within a category hierarchy can be a time-consuming task, and once a collection has been found using such a representation, an alternative means is required for searching within the site itself.
Most interfaces that depict category hierarchies graphically do so by associating a document directly with the node of the category hierarchy to which it has been assigned. For example, clicking on a category link in Yahoo! brings up a list of documents that have been assigned that category label. Conceptually, the document is stored within the category label. When navigating the results of a search in Yahoo!, the user must look through a list of category labels and guess which one is most likely to contain references to the topic of interest. A wrong path requires backing up and trying again, and remembering which pages contain which information. If the desired information is deep in the hierarchy, or not available at all, this can be a time-consuming and frustrating process. Because documents are conceptually stored `inside' categories, users cannot create queries based on combinations of categories using this interface.
It is difficult to design a good interface to integrate category
selection into query specification, in part because display of
category hierarchies takes up large amounts of screen space. For
example, Internet Grateful Med is
a Web-based service that allows an integration of search with display
and selection of MeSH category labels. After the user types in the
name of a potential category label, a long list of choices is shown in
a page. To see more information about a given label, the user selects
a link (e.g., Radiation Injuries). The causes the context of the
query to disappear because a new Web page appears showing the
ancestors of the term and its immediate descendants. If the user
attempts to see the siblings of the parent term (Wounds and Injuries)
then a new page appears that changes the context again. Radiation
Injuries appears as one of many siblings and its children can no long
be seen. To go back to the query, the illustration of the category
hierarchy disappears.
collection overviews!MeSHBROWSE MeSHBROWSE
The MeSHBrowse system [#!korn95!#] allows users to interactively
browse a subset of semantically associated links in the MeSH
hierarchy. From a given starting point, clicking on a category causes
the associated categories to be displayed in a two-dimensional tree representation.
Thus only the relevant subset of the hierarchy is shown at one time,
making browsing of this very large hierarchy a more tractable
endeavor. The interface has the space limitations inherent in a two-dimensional
hierarchy display and does not provide mechanisms for search over an
underlying document collection. See Figure .
collection overviews!HIBROWSE HIBROWSE
The HiBrowse system [#!pollitt97!#] represents category metadata more
efficiently by allowing users to display several different subsets of
category metadata simultaneously. The user first selects which
attribute type (or facet, as attributes are called in this system) to
display. For example, the user may first choose the `physical
disease' value for the Disease facet. The categories that appear one
level below this are shown along with the number of documents that
contain each category. The user can then select other attribute
types, such as Therapy and Groups (by age). The number of documents
that contain attributes from all three types are shown. If the user
now selects a refinement of one of the categories, such as the
`child' value for the Groups attribute, then the number of documents
that contain all three selected facet types are shown. At the same
time, the number of documents containing the subcategories found
below `physical disease' and `therapy (general)' are updated to
reflect this more restricted specification. See Figure
. A problem with the HiBrowse system is that it
requires users to navigate through the category hierarchy, rather than
specify queries directly. In other words, query specification is not
tightly coupledwith display of category metadata.
As a solution to some of these problems, the Cat-a-Cone interface
[#!hearst97b!#] will be described in section
.