Learning novel and interesting concepts and relations from relational databases is an important problem with many applications in database systems and machine learning. Relational learning algorithms generally leverage the properties of the database schema to find the definition of the target concept in terms of the existing relations in the database. Nevertheless, it is well established that the same data set may be represented under different schemas for various reasons, such as efficiency, data quality, and usability. Unfortunately, many current learning algorithms tend to vary quite substantially over the choice of schema, both in terms of learning accuracy and efficiency, which complicates their off-the-shelf application. In this paper, we formalize the property of schema independence of relational learning algorithms, and study both the theoretical and empirical dependence of existing algorithms on the common class of vertical (de)composition schema transformations. We study both sample-based learning algorithms, which learn from sets of labeled examples, and query-based algorithms, which learn by asking queries to a user. For sample-based algorithms we consider the two main algorithm classes: top-down and bottom-up. We prove that practical top-down algorithms are generally not schema independent, while, in contrast, two bottom-up algorithms Golem and ProGolem are schema independent with some modifications. For query-based learning algorithms we show that the vertical (de)composition transformations influence their learning efficiency. We support the theoretical results with an empirical study that demonstrates the schema dependence/independence of several algorithms on existing benchmark data sets under natural vertical (de)compositions.
from cs.AI updates on arXiv.org http://ift.tt/1MyxRdr
via IFTTT
No comments:
Post a Comment