Rule Extraction and Feature Attribution for Explainable Reinforcement Learning

Stieger, Alexander

doi:10.34726/hss.2025.120146

Record link:

https://doi.org/10.34726/hss.2025.120146
http://hdl.handle.net/20.500.12708/215652

Title:

Rule Extraction and Feature Attribution for Explainable Reinforcement Learning

Citation:

Stieger, A. (2025). Rule Extraction and Feature Attribution for Explainable Reinforcement Learning [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.120146

reposiTUm DOI:

10.34726/hss.2025.120146

CatalogPlus:

AC17527776

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Stieger, Alexander

Advisor:

Sauter, Thilo

Co-advisor:

Stippel, Christian

Organisational Unit:

E384 - Institut für Computertechnik

Date (published):

2025

Number of Pages:

Keywords:

Reinforcement Learning; Rule Extraction; Feature Attribution; HLK; Feature Selection; Regelungstechnik

Reinforcement Learning; Rule Extraction; Feature Attribution; HVAC; Feature Selection; Control System

Abstract:

Reinforcement Learning (RL) zeigt großes Potenzial, Heizungs-, Lüftungs- und Klimatechnik (HLK) durch verbesserte Energieeffizienz und Anpassungsfähigkeit zu optimieren. Allerdings gelten RL-Modelle oft als Black-Box-Systeme, was ihre Anwendbarkeit in systemkritischen und stark reglementierten Systemen einschränkt. Diese Arbeit beschäftigt sich mit Erklärbarkeit in RL-basierter Regelungstechnik. Hierfür werden zwei verschiedene Ansätze zur Verbesserung der Nachvollziehbarkeit verwendet: Erstens wird ein Stellvertretermodell entwickelt, welches das Black-Box-Policy-Modell aus dem RL-Trainingsprozess durch einen White-Box-Entscheidungsbaum ersetzt. Dabei werden Vorhersageunsicherheiten und Feature Attribution in den klassischen CART-Algorithmus (Classification and Regression Trees) integriert, um dessen Genauigkeit zu erhöhen. Zweitens werden Rule Extraction und Feature Attribution getestet, um relevante Merkmale zu identifizieren und so eine Feature Selection zu ermöglichen. Die experimentellen Ergebnisse zeigen, dass gewichtete Entscheidungsbäume – basierend auf Vorhersageunsicherheiten oder Feature Attribution– zwar das Verhalten der Modelle beeinflussen, in den meisten Fällen jedoch keine Leistungssteigerung gegenüber dem Standard-CART-Ansatz erzielen. Wird jedoch nur ein sehr flacher Entscheidungsbaum extrahiert, so lösen die daraus resultierenden Stellvertretermodelle mit Attributions- und Unsicherheitsgewichtung die Regelungsaufgabe konsistenter als mit CART extrahierte Entscheidungsbäume. Experimente zur Feature Selection verdeutlichen die Limitationen einzelner Methoden auf: Entscheidungsbaum-basierte und Feature Attribution-basierte Selektionsmethoden schneiden in unterschiedlichen Umgebungen gut ab. Ein kombinierter Ansatz, der beide Methoden nutzt, zeigt sich robuster und wirksamer über mehrere Umgebungen hinweg. Die Anwendung von Feature Attribution und Rule Extraction auf ein bestehendes RL HLK-Steuerungssystem identifiziert CO2-Werte als dominanten Prädiktor für Steuerentscheidungen, was deren Rolle bei der Belegungsabschätzung unterstreicht. Ohne direkten Zugriff auf die Umgebung konnten wir die Anzahl benötigter Sensoren um über 70% reduzieren. Das reduzierte Modell agiert dennoch sehr vergleichbar mit dem Ursprungsmodell, selbst ohne Zugriff auf den vollständigen Sensorensatz.

Reinforcement learning (RL) has shown significant potential in optimizing Heating, Ventilation, and Air Conditioning (HVAC) control systems by improving energy efficiency and adaptability. However, RL models are often considered black-box systems, limiting their applicability in safety-critical and regulatory environments. This thesis addresses the challenge of explainability in RL-based control by investigating two different approaches to explainability in RL: First, by building surrogate models that can replace the black-box policy model from the RL training process with a white-box decision tree. Here, we introduce certainty levels and feature attribution into the standard Classification and Regression Trees (CART) algorithm to enhance their performance. Second, by testing rule extraction and feature attribution, we identify the most relevant features, allowing for feature selection. Experimental results indicate that, while certainty-based or feature attribution-based weighting schemes influence decision tree behavior, they do not improve performance over standard CART in most cases. However, when extracting only very shallow decision trees, the resulting attribution- and uncertainty-weighted surrogate models solve the environment more consistently than decision trees extracted with CART. Feature selection experiments highlight the limitations of single-method approaches, with decision tree-based and feature attribution-based selection performing well in different environments. A combined approach leveraging both methods demonstrates improved robustness and performance across multiple environments. Applying feature attribution and rule extraction to a pre-existing HVAC control system reveals CO2 levels as the dominant predictor for control decisions, aligning with their role in occupancy inference. Without access to the environment, we were able to reduce the sensors required by over 70%. We show that the model performs very similarly, without having access to the full sensor set.

Additional information:

Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers

License:

In Copyright

Appears in Collections:

Thesis