Abstract
The written notes in the Electronic Health Records contain a vast amount of information about patients. Implementing automated approaches for text classification tasks requires the automated methods to be well-interpretable, and topic models can be used for this goal as they can indicate what topics in a text are relevant to making a decision. We propose a new topic modeling algorithm, FLSA-E, and compare it with another state-of-the-art algorithm FLSA-W. In FLSA-E, topics are found by fuzzy clustering in a word embedding space. Since we use word embeddings as the basis for our clustering, we extend our evaluation with word-embeddings-based evaluation metrics. We find that different evaluation metrics favour different algorithms. Based on the results, there is evidence that FLSA-E has fewer outliers in its topics, a desirable property, given that within-topic words need to be semantically related.
Original language | English |
---|---|
Title of host publication | IEEE International Conference on Systems, Man, and Cybernetics |
Publisher | IEEE |
Pages | 2669-2674 |
Number of pages | 6 |
ISBN (Electronic) | 978-1-6654-5258-8 |
DOIs | |
Publication status | Published - 2022 |
Event | 2022 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2022 - Prague, Czech Republic Duration: 9 Oct 2022 → 12 Oct 2022 |
Publication series
Name | Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics |
---|---|
Volume | 2022-October |
ISSN (Print) | 1062-922X |
Conference
Conference | 2022 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2022 |
---|---|
Country/Territory | Czech Republic |
City | Prague |
Period | 9/10/22 → 12/10/22 |
Bibliographical note
Publisher Copyright:© 2022 IEEE.
Keywords
- Electronic Health Records
- Fuzzy Clustering
- Fuzzy Methods
- Natural Language Processing
- Neural Network methods
- Psychiatry
- Topic Modeling
- Word Embeddings