Abstract
Finding subsets of a dataset that somehow deviate from the norm, i.e. where
something interesting is going on, is a classical Data Mining task. In traditional local
pattern mining methods, such deviations are measured in terms of a relatively high
occurrence (frequent itemset mining), or an unusual distribution for one designated
target attribute (common use of subgroup discovery). These, however, do not encompass
all forms of “interesting”. To capture a more general notion of interestingness in
subsets of a dataset, we develop Exceptional Model Mining (EMM). This is a supervised
local pattern mining framework, where several target attributes are selected,
and a model over these targets is chosen to be the target concept. Then, we strive
to find subgroups: subsets of the dataset that can be described by a few conditions
on single attributes. Such subgroups are deemed interesting when the model over the
targets on the subgroup is substantially different from the model on the whole dataset.
For instance, we can find subgroups where two target attributes have an unusual correlation,
a classifier has a deviating predictive performance, or a Bayesian network
fitted on several target attributes has an exceptional structure. We give an algorithmic solution for the EMM framework, and analyze its computational complexity.We also
discuss some illustrative applications ofEMMinstances, including using the Bayesian
network model to identify meteorological conditions under which food chains are displaced,
and using a regression model to find the subset of households in the Chinese
province of Hunan that do not follow the general economic law of demand.
something interesting is going on, is a classical Data Mining task. In traditional local
pattern mining methods, such deviations are measured in terms of a relatively high
occurrence (frequent itemset mining), or an unusual distribution for one designated
target attribute (common use of subgroup discovery). These, however, do not encompass
all forms of “interesting”. To capture a more general notion of interestingness in
subsets of a dataset, we develop Exceptional Model Mining (EMM). This is a supervised
local pattern mining framework, where several target attributes are selected,
and a model over these targets is chosen to be the target concept. Then, we strive
to find subgroups: subsets of the dataset that can be described by a few conditions
on single attributes. Such subgroups are deemed interesting when the model over the
targets on the subgroup is substantially different from the model on the whole dataset.
For instance, we can find subgroups where two target attributes have an unusual correlation,
a classifier has a deviating predictive performance, or a Bayesian network
fitted on several target attributes has an exceptional structure. We give an algorithmic solution for the EMM framework, and analyze its computational complexity.We also
discuss some illustrative applications ofEMMinstances, including using the Bayesian
network model to identify meteorological conditions under which food chains are displaced,
and using a regression model to find the subset of households in the Chinese
province of Hunan that do not follow the general economic law of demand.
Original language | English |
---|---|
Pages (from-to) | 1-52 |
Number of pages | 52 |
Journal | Data Mining and Knowledge Discovery |
DOIs | |
Publication status | Published - 4 Feb 2015 |
Keywords
- Exceptional Model Mining
- Subgroup Discovery
- Supervised Local Pattern Mining
- Regression
- Bayesian Networks