Dinev, G. M. (2022). Combining decision modelling and machine learning: an investigation in the insurance sector [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2022.82983
Since the last decade there has been a rapid rise in the use of BPMN (Business Process Model and Notation) standard in modeling of business processes. However, BPMN may be impractical due to its complexity and weak interoperability between business process tools. Recently, the Decision Model and Notation (DMN) standard has been introduced by OMG (Object Management Group), which is able to simplify the latter standard for decision modeling and/or multi-criteria decision-making. The purpose of DMN is to be readable and adjustable for people from business, as well as IT, respectively. The advances of technology and innovation have led to emerging big data analytics and new computational methods. Machine Learning tools are essential for the maximum utilization of the information in decisions makers. Data-driven technologies and BPMN both provide powerful tools, however according to the state-of-the-art there is no solution for coupling them in a synergistic manner. In addition, automation of modeling, using the DMN standard and the application of Machine Learning tools in this domain is still a challenge as modeling in the DMN standard requires manual steps, and ML tools are not natively supported by it. Therefore, in this thesis a Toolchain is proposed for tackling the above mentioned issues. The Thesis presents the design steps of the proposed solution. The input of the Toolchain can be either raw field data or alternatively a generated test case set from a DMN model. The proposed Toolchain implements the following three consecutive automated levels: Statistical Analysis with data preprocessing, a modeling step with three distinguished modeling strategies, and lastly an Evaluation stage. The statistical analysis covers correlation analysis, identification of the distribution of the variables, etc. The modeling stage includes fitting linear, standard Machine Learning CART and ensemble-type XGBoost models. These models are capable to handle the various levels of relationships between variables from linear to highly non-linear, which may compensate for the deficiencies of the original DMN model, since it is rather intuitive and may contain several overlapping or inefficient decision rules due to the manual creation of decision boundaries. The output of the Toolchain is a human readable result package, including the statistical analysis, the model performance evaluation and other partial results. The results obtained from experiments on a big data and a smaller insurance dataset confirms the applicability and validity of the proposed method. The results also indicate that the XGBoost model due to its outstanding performance is a suitable candidate for applying in a DMN standard instead of, e.g., a decision table. Furthermore, ML-based decision models would provide more flexibility and adaptivity that may result in easier automation of the decision process. Benchmarking in the context of execution and training times are also performed with special regard to the model complexity. The designed Toolchain aims to bridge the gap between ML and the DMN standard. Besides, the Thesis may provide valuable insights to the domain experts’ to better understand their models and empower decision makers with a different views on modeling.