Besides precision & recall: exploring alternative approaches to evaluating an automatic indexing tool for MEDLINE.
Objective: This paper explores alternative approaches for the evaluation of an automatic indexing tool for MEDLINE, complementing the traditional precision and recall method.
Methods: The performance of MTI, the Medical Text Indexer used at NLM to produce MeSH recommendations for biomedical journal articles is evaluated on a random set of MEDLINE citations. The evaluation examines semantic similarity at the term level (indexing terms). In addition, the documents retrieved by queries resulting from MTI index terms for a given document are compared to the PubMed related citations for this document.
Results: Semantic similarity scores between sets of index terms are higher than the corresponding Dice similarity scores. Overall, 75% of the original documents and 58% of the top ten related citations are retrieved by queries based on the automatic indexing.
Conclusions: The alternative measures studied in this paper confirm previous findings and may be used to select particular documents from the test set for a more thorough analysis.