Abstract
Nowadays, relational databases have become the de facto standard to store large quantities of data. As the manual analysis of these large quantities of data is practically impossible, the field of data mining provides methods that attempts to automatically acquire insight into the data. One cornerstone technique is that of pattern mining: finding interesting regularity in data.
Despite all good e orts, one can conclude that pattern mining still has a major Achilles' heel, that is, the ease at which patterns can be found. Many found patterns are slight variations on the same underlying theme, although many of them are still designated as interesting. In practice, a user gets swamped
by too many similar patterns that do not contribute to a new insight into the database.
In this thesis we therefore propose a di erent approach. In contrast to selecting patterns on an individual basis, we propose the selection of pattern sets. In particular, we focus on a selection scheme based on a compression technique called the Minimum Description Length (MDL) principle. The selected pattern set, our model of the data, is used to compress the complete database. According to the MDL principle, the model that compresses the database best is also the one that describes it best.
As acquiring the optimal model of a database is simply too complex, we utilise a practical and heuristic approach, named Krimp. Based on this, we designed a toolbox of algorithms that derives models for di erent interpretations of the data. We discuss structured data types such as sequences and trees, the join of the database, and relational databases as a whole. These last models also show to result in good classifiers.
We back up the claims in this thesis by experimental evaluation. For many of the used databases, the number of patterns initially is huge. However, we show that from this huge collection of patterns, we select a compact and good set of characteristic relational patterns.
Original language | Undefined/Unknown |
---|---|
Qualification | Doctor of Philosophy |
Awarding Institution |
|
Supervisors/Advisors |
|
Award date | 31 May 2010 |
Publisher | |
Print ISBNs | 978-90-393-5320-2 |
Publication status | Published - 31 May 2010 |