Simulation-based generation of heuristics for decision-making in stochastic environments

Körner, Andreas; Pasterk, Daniel; Stadler, Florian; Zeh, Christine

doi:10.1080/03155986.2025.2592355

Record link:

http://hdl.handle.net/20.500.12708/222253

Title:

Simulation-based generation of heuristics for decision-making in stochastic environments

Citation:

Körner, A., Pasterk, D., Stadler, F., & Zeh, C. (2025). Simulation-based generation of heuristics for decision-making in stochastic environments. INFOR. https://doi.org/10.1080/03155986.2025.2592355

Publisher DOI:

10.1080/03155986.2025.2592355

Publication Type:

Article - Original Research Article

Language:

English

Authors:

Körner, Andreas
Pasterk, Daniel
Stadler, Florian
Zeh, Christine

Organisational Unit:

E101-03-3 - Forschungsgruppe Mathematik in Simulation und Ausbildung

Journal:

INFOR

ISSN:

0315-5986

Date (published):

2025

Number of Pages:

Publisher:

TAYLOR & FRANCIS INC

Peer reviewed:

Yes

Keywords:

Decision optimisation; explainable artificial intelligence; heuristics generation; Markov decision processes; reinforcement learning; stochastic demand

Abstract:

Decision-making in stochastic environments often requires a trade-off between performance and interpretability. Although Reinforcement Learning (RL) excels at creating adaptive policies, the resulting solutions are not transparent. Conversely, while heuristics offer transparency, they often lack optimality and adaptability. In this work, we present a general framework that combines the strengths of both approaches. First, we use RL to train a policy on a Markov Decision Process (MDP). Then, we extract transparent heuristics in the form of decision trees via interpretable learning (VIPER). To conclude our method, we apply pruning to the tree, aiming to simplify its structure and improve the generalisation of the resulting rule set. We demonstrate this approach using a logistics case study involving significant variability in production and demand. The resulting heuristics outperform expert-designed rules and match the performance of the original RL policy, offering transparency and robustness. This method allows for data-driven, explainable decision-making that does not require domain-specific expertise.

Research Areas:

Modeling and Simulation: 100%

Science Branch:

1010 - Mathematik: 100%

Appears in Collections:

Article

Show full item record

Google Scholar^TM

Check

Google ScholarTM

Google Scholar^TM