Abstract
This thesis describes a number of new data mining algorithms which were the result of our research into the enforcement of monotony restrictions when learning (mostly non-parametric) models from data.
Not only can judicious use of domain knowledge improve the predictive accuracy of data mining algorithms but also, crucially, models that are consistent with the knowledge of domain experts will be accepted and adopted much earlier than models that are not. Unfortunately, domain knowledge that is most of times available is often informal and poorly structured, which makes its use in practice fraught with difficulty.
Knowledge of an ascending or descending relationship between predictor variables and the variable to predict represents a notable exception. Moreover, in many applications domain experts can specify such monotonic relationships with relative ease and reliability based on their knowledge and experience. It is known, for instance, that smoking and being overweight increase the risk of cardiovascular disease (an increasing relationship); on the other hand, it is likely that a higher income reduces the probability of default on a loan (a decreasing relationship).
The experiments described in this thesis show that the predictive power of our new data mining algorithms is comparable to, or sometimes even better than, that of their non-monotonic counterparts. This is obtained at a limited additional computational cost. All in all, we may conclude that enforcing available monotony restrictions is practically achievable and has an advantageous effect on the quality of the models produced.
Original language | English |
---|---|
Qualification | Doctor of Philosophy |
Awarding Institution |
|
Supervisors/Advisors |
|
Award date | 10 Feb 2014 |
Place of Publication | [Utrecht] |
Publisher | |
Print ISBNs | 978-90-393-7093-3 |
Publication status | Published - 10 Feb 2014 |
Keywords
- Wiskunde en Informatica (WIIN)