SAT-encodings; Maximum Satisfiability; Local Improvement; Graph Decompositions; Bayesian Networks; Prediction Models; Heuristic Algorithms; Structure Learning; Computational Experiments
en
Abstract:
When it comes to trustworthy predictive models, Bayesian Networks (BNs) play a critical role in the real-world use of predictive models. For instance, BNs see widespread usage in Medicine for diagnosing illnesses, determining causes, estimating effectiveness of potential remedies, etc. Trustworthiness is baked into BNs in contrast to other popular models, thus increasing their appeal. A BN consists of Random Variables (RVs) representing real-world events joined together by arcs which represent their interdependence. One also needs to specify a conditional probability distribution (CPD) table. Hence, learning a BN entails learning the parameter (the CPD tables) and the structure (the directed acyclic graph (DAG)). In this thesis we focus on the problem of BN structure learning (BNSL), which is known to be notoriously hard. Our proposed methods address three important properties of BNs - scalability, tractability and quality - and optimizes them simultaneously. Motivated by real-world settings, our heuristic methods can scale to learn BNs with thousands of RVs. To ensure efficient querying of a BN, we need tractability which we achieve by bounding structural metrics like treewidth. Further, we ensure that quality of the BN doesn't suffer in the process and that the input data is well-represented. We use the SAT-based Local Improvement Method (SLIM) framework to develop several heuristic algorithms which cleverly combine the efficiency of state-of-the-art heuristics with the rigor of a SAT-solver to outperform previous heuristic methods. Due to the overarching goal of practicality, we implemented and tested our algorithms and show that they perform favorably in comparison to previous work.