Multi-Paradigm Learning of Declarative Models



 
Author: Cèsar Ferri
Title: Multi-Paradigm Learning of Declarative Models
Language of presentation: Spanish
Promotor: Prof. José Hernández-Orallo and Prof. María José Ramírez
Date of defense: May 2003
Institution granting degree: Universidad Politénica de Valencia, Spain

Abstract

Many approaches for obtaining systems with intelligent behaviour are based on components that learn automatically from previous experience. The development of these learning techniques is the objective of the area of research known as machine learning. During the last decade, researchers have produced numerous and outstanding advances in this area, boosted by the successful application of machine learning techniques to knowledge discovery from databases, whose principal stage is referred to as data mining. Many techniques used in this field have very limited applicability due to the use of very restrictive or incomprehensible representation languages which are unable to express rich and understandable knowledge. This fact, however, significantly simplifies the inherent complexity of the learning process. For this reason, several methods which are based on the enrichment of language expressiveness have been recently defined. Specifically, the incorporation of first-order logic into the learning process has constituted a very important advance; this is indeed the nexus of the induction algorithms pertaining to the field called Inductive Logic Programming (ILP). However, first-order languages employed by ILP techniques lack some interesting features which are useful for learning: types, functions and higher-order features. Another approach to increasing the expressiveness of the representation language is the generation of a committee hypothesis by combining a set of models. These approaches, usually referred to as ensemble methods, however, have important disadvantages such as the loss of understandability of the combined model and a high consumption of computational resources.

In this thesis, we present some contributions to the field of machine learning that partially solve some of these problems. Specifically, we introduce learning methods that increase the expressiveness of the representation language, improve the accuracy of the learned models, and reduce the inherent costs of the learning process and the application of the learned model.

The first contribution is the definition of an induction framework based on multi-paradigm languages. These languages combine the best features of the two main families of declarative paradigms: functional programming and logic programming. The use of languages of this kind for learning provides advantages such as types and functions. Our new learning framework has been implemented in a machine learning system, for which we include several experiments and results.

Learning with very expressive representation languages makes it possible to deal with problems having a complex structure. However, this considerably increases the search space of the learning algorithms, thus reducing their efficiency. Therefore, problems with a huge amount of data or attributes, typical of data mining, are better addressed by approaches based on simple representation languages such as propositional languages. The design of methods that could obtain comprehensible, and more accurate models according to the computational resources provided to the algorithm would be useful. In this context, this thesis introduces the definition of decision multi-trees and presents several applications for this structure. First, we define an anytime algorithm which utilises the multi-tree structure for the generation of decision trees, which gradually improve in terms of accuracy and comprehensibility with respect to classifiers constructed by the classic decision-tree learning algorithms. The multi-tree structure is also employed as an ensemble method by combining the hypotheses contained in the multi-tree. The major advantage of this method over other ensemble algorithms is the important saving in computational resources due to the very features of the multi-tree structure, where the hypotheses share common parts. In addition, we include a novel method for generating a comprehensible hypothesis that is semantically similar to the combined model, which requires even fewer resources.

The real usefulness of a learning method not only depends on the obtainment of expressive, comprehensible and accurate models, but also on the reduction of the costs related to the context in which they are going to be applied. Examples of these costs are misclassification costs or the costs necessary to apply the attribute tests in order to compute the predictions of a classifier (test costs). In this thesis, we introduce a technique based on the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC), in order to efficiently evaluate the quality of a rule-based binary classifier independently of the cost context. We also use the efficiency of this measurement to derive a new splitting criterion that constructs decision trees based on ROC analysis. We extend this criterion to more than two classes, using approximations of the AUC for multi-class problems. Finally, we investigate several techniques to reduce the costs associated to the tests of classifiers, as well as study the integration of all the above techniques.

In summary, this thesis contributes new learning methods to generate more expressive, comprehensible and accurate models, which are capable of customising the use of computational resources, the quality of the classifiers and the costs of application in variable contexts.