Random Forests for Premise Selection
Michael Färber and Cezary KaliszykProceedings of the 10th International Symposium on Frontiers of Combining Systems (FroCoS 2015), Lecture Notes in Artificial Intelligence 9322, pp. 325 – 340, 2015.
Abstract
The success rates of automated theorem provers in large theories highly depend on the choice of given facts. Premise selection is the task of choosing a subset of given facts, which is most likely to lead to a successful automated deduction proof of a given conjecture. Premise selection can be viewed as a multi-label classification problem, where machine learning from related proofs turns out to currently be the most successful method. Random forests are a machine learning technique known to perform especially well on large datasets. In this paper, we evaluate random forest algorithms for premise selection. To deal with the specifics of automated reasoning, we propose a number of extensions to random forests, such as incremental learning, multi-path querying, depth weighting, feature IDF (inverse document frequency), and integration of secondary classifiers in the tree leaves. With these extensions, we improve on the k-nearest neighbour algorithm both in terms of prediction quality and ATP performance.
BibTeX
@inproceedings{MFCK-FROCOS2015, author = "Michael F{\"a}rber and Cezary Kaliszyk", title = "Random Forests for Premise Selection", booktitle = "Proceedings of the 10th International Symposium on Frontiers of Combining Systems (FroCoS 2015)", editor = "Carsten Lutz and Silvio Ranise", series = "Lecture Notes in Artificial Intelligence", volume = 9322, pages = "325--340", year = 2015, doi = "10.1007/978-3-319-24246-0_20" }