TY - GEN
T1 - Automatic document indexing in large medical collections
AU - Hliaoutakis, Angelos
AU - Zervanou, Kalliopi
AU - Petrakis, Euripides G.M.
AU - Milios, Evangelos E.
PY - 2006
Y1 - 2006
N2 - Term extraction relates to extracting the most characteristic or important terms (words or phrases) in a document. This information is commonly used for improving the accuracy of document indexing and retrieval in large text collections. It also allows for faster and better understanding of the contents of a document collection without first browsing through the contents of its documents. This paper presents AMTEx an automatic term extraction method, specifically designed for the automatic indexing of documents in large medical collections such as MEDLINE, the premier bibliographic database of the U.S. National Library of Medicine (NLM). AMTEx combines MeSH, the terminological thesaurus resource of NLM, with a well-established method for extraction of domain terms, the C/NC-value method. The performance evaluation of various AMTEx configurations in the indexing task is measured against the current state-of-the-art, the MMTx method. The experimental results on a subset of MEDLINE documents demonstrate that AMTEx achieves better precision and recall than MMTx.
AB - Term extraction relates to extracting the most characteristic or important terms (words or phrases) in a document. This information is commonly used for improving the accuracy of document indexing and retrieval in large text collections. It also allows for faster and better understanding of the contents of a document collection without first browsing through the contents of its documents. This paper presents AMTEx an automatic term extraction method, specifically designed for the automatic indexing of documents in large medical collections such as MEDLINE, the premier bibliographic database of the U.S. National Library of Medicine (NLM). AMTEx combines MeSH, the terminological thesaurus resource of NLM, with a well-established method for extraction of domain terms, the C/NC-value method. The performance evaluation of various AMTEx configurations in the indexing task is measured against the current state-of-the-art, the MMTx method. The experimental results on a subset of MEDLINE documents demonstrate that AMTEx achieves better precision and recall than MMTx.
KW - Document indexing
KW - Medical document retrieval
KW - Term extraction
UR - http://www.scopus.com/inward/record.url?scp=34547678017&partnerID=8YFLogxK
U2 - 10.1145/1183568.1183570
DO - 10.1145/1183568.1183570
M3 - Conference contribution
AN - SCOPUS:34547678017
SN - 1595935282
SN - 9781595935281
T3 - Proceedings of HIKM 2006: International Workshop on Healthcare Information and Knowledge Management
SP - 1
EP - 8
BT - CIKM 2006 Workshop - Proceedings of HIKM 2006
T2 - HIKM 2006: International Workshop on Healthcare Information and Knowledge Management, held in conjunction with the ACM 15th Conference on Information and Knowledge Management, CIKM 2006
Y2 - 11 November 2006 through 11 November 2006
ER -