<div class="csl-bib-body">
<div class="csl-entry">Strümpf, K. (2025). <i>Supporting domain experts develop data exploration and modelling workflows: a ML-based approach</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.101362</div>
</div>
-
dc.identifier.uri
https://doi.org/10.34726/hss.2025.101362
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/220411
-
dc.description
Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft
-
dc.description
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers
-
dc.description.abstract
Domain experts in fields such as healthcare, marketing, or manufacturing are increasingly expected to engage with data analysis tasks. However, existing tools either require programming knowledge or limit users to predefined operations in graphical interfaces, creating barriers for non-technical users seeking to build meaningful data workflows.This thesis explores how Data Analysis Workflows (DAWs) can be made more accessible by automating their construction through two complementary approaches: a structured graph-based system and a prompt-driven system based on Large Language Models (LLMs). Both systems are designed to support domain experts in generating DAWs without requiring programming expertise.The graph-based system represents workflows as Directed Acyclic Graphs (DAGs), where nodes correspond to datasets and operations. It incorporates a Monte-Carlo Tree Search (MCTS) strategy for generating synthetic training data and supervised learning models that predict valid pipeline structures. In contrast, the LLM-based system relies on prompt-based interactions to generate executable Python code directly from natural language input, offering greater flexibility and reducing the engineering effort needed to define task-specific logic.Both systems were implemented as web applications and evaluated across several dimensions, including predictive performance, engineering complexity, and user-facing flexibility. The graph-based system demonstrates higher reproducibility and transparent pipeline construction, while the LLM-based approach offers rapid prototyping and lower development overhead at the cost of increased uncertainty and reduced control.This comparative analysis reveals key trade-offs in the design of systems for domain-expert-centric DAW generation. It also highlights the potential for combining the strengths of both approaches in future research. Recommendations are provided for improving robustness, expanding system capabilities, and incorporating user feedback to further lower the barriers to accessible and effective data analysis.
en
dc.language
English
-
dc.language.iso
en
-
dc.rights.uri
http://rightsstatements.org/vocab/InC/1.0/
-
dc.subject
data analysis workflows
en
dc.subject
low-code systems
en
dc.subject
graph-based pipeline generation
en
dc.subject
large language models
en
dc.subject
AutoML
en
dc.subject
Monte Carlo Tree Search
en
dc.subject
data pipeline automation
en
dc.subject
human-in-the-loop machine learning
en
dc.title
Supporting domain experts develop data exploration and modelling workflows: a ML-based approach
en
dc.title.alternative
Unterstützung von Fachexperten bei der Datenexploration und Modellierung: ein ML-basierter Ansatz
de
dc.type
Thesis
en
dc.type
Hochschulschrift
de
dc.rights.license
In Copyright
en
dc.rights.license
Urheberrechtsschutz
de
dc.identifier.doi
10.34726/hss.2025.101362
-
dc.contributor.affiliation
TU Wien, Österreich
-
dc.rights.holder
Konstantin Strümpf
-
dc.publisher.place
Wien
-
tuw.version
vor
-
tuw.thesisinformation
Technische Universität Wien
-
dc.contributor.assistant
Morichetta, Andrea
-
tuw.publication.orgunit
E194 - Institut für Information Systems Engineering
-
dc.type.qualificationlevel
Diploma
-
dc.identifier.libraryid
AC17681814
-
dc.description.numberOfPages
162
-
dc.thesistype
Diplomarbeit
de
dc.thesistype
Diploma Thesis
en
dc.rights.identifier
In Copyright
en
dc.rights.identifier
Urheberrechtsschutz
de
tuw.advisor.staffStatus
staff
-
tuw.assistant.staffStatus
staff
-
tuw.advisor.orcid
0000-0001-6872-8821
-
tuw.assistant.orcid
0000-0003-3765-3067
-
item.fulltext
with Fulltext
-
item.openaccessfulltext
Open Access
-
item.languageiso639-1
en
-
item.openairecristype
http://purl.org/coar/resource_type/c_bdcc
-
item.cerifentitytype
Publications
-
item.grantfulltext
open
-
item.openairetype
master thesis
-
item.mimetype
application/pdf
-
crisitem.author.dept
E194 - Institut für Information Systems Engineering