Körner, A., Pasterk, D., Stadler, F., & Zeh, C. (2025). Simulation-based generation of heuristics for decision-making in stochastic environments. INFOR. https://doi.org/10.1080/03155986.2025.2592355
Decision-making in stochastic environments often requires a trade-off between performance and interpretability. Although Reinforcement Learning (RL) excels at creating adaptive policies, the resulting solutions are not transparent. Conversely, while heuristics offer transparency, they often lack optimality and adaptability. In this work, we present a general framework that combines the strengths of both approaches. First, we use RL to train a policy on a Markov Decision Process (MDP). Then, we extract transparent heuristics in the form of decision trees via interpretable learning (VIPER). To conclude our method, we apply pruning to the tree, aiming to simplify its structure and improve the generalisation of the resulting rule set. We demonstrate this approach using a logistics case study involving significant variability in production and demand. The resulting heuristics outperform expert-designed rules and match the performance of the original RL policy, offering transparency and robustness. This method allows for data-driven, explainable decision-making that does not require domain-specific expertise.