Effective and Efficient Classification of Topically-Enriched Domain-Specific Text Snippets: The TETSC Method

Marco R. Spruit, Bas Vlug

    Research output: Contribution to journalArticleAcademicpeer-review

    Abstract

    Due to the explosive growth in the amount of text snippets over the past few years and their sparsity of text, organizations are unable to effectively and efficiently classify them, missing out on business opportunities. This paper presents TETSC: the Topically-Enriched Text Snippet Classification method. TETSC aims to solve the classification problem for text snippets in any domain. TETSC recognizes that there are different types of text snippets and, therefore, allows for stop word removal, named-entity recognition, and topical enrichment for the different types of text snippets. TETSC has been implemented in the production systems of a personal finance organization, which resulted in a classification error reduction of over 21%. Highlights: The authors create the TETSC method for classifying topically-enriched text snippets; the authors differentiate between different types of text snippets; the authors show a successful application of Named-Entity Recognition to text snippets; using multiple enrichment strategies appears to reduce effectivity.
    Original languageEnglish
    Pages (from-to)1-17
    Number of pages17
    JournalIJSDS
    Volume6
    Issue number3
    DOIs
    Publication statusPublished - 2015

    Fingerprint

    Dive into the research topics of 'Effective and Efficient Classification of Topically-Enriched Domain-Specific Text Snippets: The TETSC Method'. Together they form a unique fingerprint.

    Cite this