Modern Information Retrieval
Cystic Fibrosis Reference Collection


Contents

The Cystic Fibrosis Database (CF) consists of 1239 documents published from 1974 to 1979 discussing Cystic Fibrosis Aspects, and a set of 100 queries with the respective relevant documents as answers.

The original collection is available in a single gzipped tar file of 1.47Mb, containing 7 document files and 1 query file.

The collection is also available in XML format also in a single gzipped tar file of 1.54Mb, including the Document Type Definition (DTD) for the collection and for each query/answer.

Document Files

Each document includes 11 fields as follows:

Paper Number
The first two digits give the year of publication, and the rest three digits range from 1 to the number of docs published that yea
Record Number
serial id number varying from 1 to 1,239.
Medline Acession Number
CF is a subset of the MEDLINE database.
Author(s)
Title
Source
Bibliographic citation of source.
Major Subjects
The Medical Subject Headings (MeSH) and subheadings representing the major subjects of the document. The Medical Subject Headings are shown in capital letters and have been assigned by expert indexers. The two-letter symbols are subject subheadings, also assigned manually from a controlled vocabulary (see the MeSH vocabulary published by the National Library of Medicine).
Minor Subjects
The Medical Subject Headings (MeSH) and subheadings representing the minor subjects of the document. The Medical Subject Headings are shown in capital letters and have been assigned by expert indexers. The two-letter symbols are subject subheadings, also assigned manually from a controlled vocabulary (see the MeSH vocabulary published by the National Library of Medicine)
Abstract/Extract
The abstract of the document, or in the case of a document with no published abstract, an extract from text.
References
The complete list of references appearing in the document, excluding private comunications and unpublished documents
Citations
A comprehensive list of citations to the document, as indexed in the SCISEARCH/DIALOG files

Query Files

Each query includes a query number and text, the record number of each relevant document in the answer, and relevance scores.

The relevance scores are from from 4 different sources: REW (one of the authors), faculty colleagues of REW, post-doctorate associate of REW, and JBW (other author and a medical bibliographer).

The relevance scores vary from 0 to 2 with the following meaning:

   2   HIGHLY relevant
   1   MARGINALLY relevant
   0   NOT relevant

Example of a document answer: 513   0010

   Doc number: 513
   Relevance score by REW: NOT relevant.
   Relevance score by REW colleagues: NOT relevant.
   Relevance score by REW post-doctorates: MARGINALLY relevant
   Relevance score by JBW: NOT relevant.

Copyright

This collection is available thanks to the original authors from the School of Information and Library Science, University of North Carolina, Chapel Hill, NC 27599-3360, USA. They have the copyright (1989) and the reference to their work is:

The citations in the CF document collection represent a small subset of MEDLINE data and should not be used to search for current references on the subject of cystic fibrosis.