Exploring Embedding Spaces for more Coherent Topic Modeling in Electronic Health Records

Emil Rijcken, Kalliopi Zervanou, Marco Spruit, Pablo Mosteiro Romero, F.E. Scheepers, Uzay Kaymak

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    Abstract

    The written notes in the Electronic Health Records contain a vast amount of information about patients. Implementing automated approaches for text classification tasks requires the automated methods to be well-interpretable, and topic models can be used for this goal as they can indicate what topics in a text are relevant to making a decision. We propose a new topic modeling algorithm, FLSA-E, and compare it with another state-of-the-art algorithm FLSA-W. In FLSA-E, topics are found by fuzzy clustering in a word embedding space. Since we use word embeddings as the basis for our clustering, we extend our evaluation with word-embeddings-based evaluation metrics. We find that different evaluation metrics favour different algorithms. Based on the results, there is evidence that FLSA-E has fewer outliers in its topics, a desirable property, given that within-topic words need to be semantically related.
    Original languageEnglish
    Title of host publicationIEEE International Conference on Systems, Man, and Cybernetics
    PublisherIEEE
    Pages2669-2674
    Number of pages6
    ISBN (Electronic)978-1-6654-5258-8
    DOIs
    Publication statusPublished - 2022
    Event2022 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2022 - Prague, Czech Republic
    Duration: 9 Oct 202212 Oct 2022

    Publication series

    NameConference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
    Volume2022-October
    ISSN (Print)1062-922X

    Conference

    Conference2022 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2022
    Country/TerritoryCzech Republic
    CityPrague
    Period9/10/2212/10/22

    Bibliographical note

    Publisher Copyright:
    © 2022 IEEE.

    Keywords

    • Electronic Health Records
    • Fuzzy Clustering
    • Fuzzy Methods
    • Natural Language Processing
    • Neural Network methods
    • Psychiatry
    • Topic Modeling
    • Word Embeddings

    Fingerprint

    Dive into the research topics of 'Exploring Embedding Spaces for more Coherent Topic Modeling in Electronic Health Records'. Together they form a unique fingerprint.

    Cite this