Insight in Information: from Abstract to Anomaly

R. Bertens

    Research output: ThesisDoctoral thesis 1 (Research UU / Graduation UU)

    Abstract

    As a result of cheap data storage, nowadays it is not the question if a company or institution collects data or not, but rather how much they collect. Transforming data into information and getting insight in this information is perhaps the most important problem in our data rich society. That is, only collecting data serves no goal, but data becomes valuable when insight can be gained from it.
    Data mining is the subfield of computer science that concerns itself with transforming large amounts of data into information in the form of patterns. The idea is that the identified patterns result in new insights by exposing interesting structure or behaviour in the data. It may be obvious that defining what exactly is interesting is one of the key challenges.
    One of the main applications of data mining on which we focus in this thesis is exploratory data analysis. In this analysis we make use of summaries and characterisations of a dataset to gain insight. That is, by inspecting and exploring the patterns that comprise these models we can extract important information from the data. In this thesis we employ the Minimum Description Length (MDL) principle to find such models which we call summaries. That is, we find the best summary as the set of patterns that give the best compression of the data.
    Additionally, these summaries can also be used for other data mining tasks, such as the identification of irregular or abnormal data points. All these deviations from what could be expected are called anomalies. We also focus on anomaly detection in this thesis, for which the goal is to gain more insight in the information we already have.
    Finally, we conclude that the MDL principle can be successfully employed in the domain of multivariate sequential data. Both for summarisation and anomaly detection successful algorithms have been introduced which are tested on a variety of synthetic and real world datasets.
    Original languageEnglish
    Awarding Institution
    • Utrecht University
    Supervisors/Advisors
    • Siebes, Arno, Primary supervisor
    • Vreeken, J., Co-supervisor
    Award date17 May 2017
    Publisher
    Print ISBNs978-90-393-6721-6
    Publication statusPublished - 17 May 2017

    Bibliographical note

    SIKS Dissertation Series ; 2017-07

    Keywords

    • Sequence Mining
    • MDL
    • Multivariate Event Sequences
    • Summarisation
    • Anomaly Detection

    Fingerprint

    Dive into the research topics of 'Insight in Information: from Abstract to Anomaly'. Together they form a unique fingerprint.

    Cite this