Systematic extension of CRISP-DM by structured mapping of emerging regulatory requirements on bias in AI

Eberle, Fabian

doi:10.34726/hss.2023.111902

Record link:

https://doi.org/10.34726/hss.2023.111902
http://hdl.handle.net/20.500.12708/189389

Title:

Systematic extension of CRISP-DM by structured mapping of emerging regulatory requirements on bias in AI

Citation:

Eberle, F. (2023). Systematic extension of CRISP-DM by structured mapping of emerging regulatory requirements on bias in AI [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2023.111902

reposiTUm DOI:

10.34726/hss.2023.111902

CatalogPlus:

AC16981822

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Eberle, Fabian

Advisor:

Rauber, Andreas

Organisational Unit:

E194 - Institut für Information Systems Engineering

Date (published):

2023

Number of Pages:

122

Keywords:

: data mining; artificial intelligence; bias; fairness in machine learning; process models; CRISP-DM; responsible data science; regulation; AI-Act

Abstract:

Through the broad utilization of Artificial Intelligence (AI), examples that have gone wrong according to bias, fairness and discrimination, have attracted public attention and shed to a bad light on AI systems. One famous example is COMPAS, which aimed to predict the probability of criminals reoffending and it turned out, that COMPAS (unwittingly) discriminated against skin colour. Although it has been applied in the U.S., similar incidents in Europe drew the attention of European Union (EU) legislators, motivating them to regulate the market for AI systems and presented a proposal to do so in April 2021. The regulation especially targets preventing harmful outcomes to humans, such as discrimination, which is stated in a strong context to bias. Examining existing process models such as the Cross Industry Standard Process Model for Data Mining (CRISP-DM) according to fitness for emerging regulatory obligations unveiled a lack of guidance for unveiling and treating bias during the development process which was identified as a gap.In this work, we identify a broad variety of bias types in the literature by performing a mapping study providing explanations on different bias types, where they emerge and how they can be unveiled. The identified bias types form the basis for the core of this work, the provision of a mapping of the identified biases according to the associated tasks in CRISP-DM. Moreover, the mapping is compared with requirements from the European Union Artificial Intelligence Act (AI-Act) and enriched with dedicated steps to propose an approach to reaching compliance with the regulation.

Durch die breite Anwendung von Künstlicher Intelligenz (KI) ziehen Beispiele, die in Bezug auf Voreingenommenheit, Fairness und Diskriminierung schief gelaufen sind, die öffentliche Aufmerksamkeit auf sich und werfen damit ein schlechtes Licht auf KI- Systeme. Ein bekanntes Beispiel dafür ist COMPAS, welches die Wahrscheinlichkeit der Rückfälligkeit von Straftätern vorhersagen sollte und dabei (unwissentlich) gegen die Hautfarbe diskriminierte. Obwohl es in den USA angewandt wurde, ziehen ähnliche Vorfälle in Europa die Aufmerksamkeit der EU-Gesetzgeber auf sich, die den Markt für KI-Systeme regulieren und im April 2021 einen entsprechenden Vorschlag vorgelegt haben. Die Regulierung zielt vor allem darauf ab, potenziellen Schaden an und gegen Menschen, wie etwa Diskriminierung oder Bedrohungen gegen Leib und Leben zu verhindern und steht damit in einem starken Zusammenhang mit Voreingenommenheit bzw. Bias. Bei der Untersuchung bestehender Prozessmodelle wie Cross Industry Standard Process Model for Data Mining (CRISP-DM) im Hinblick auf die Eignung für aufkommende Regulierungen haben wir ein Mangel an Leitlinien und Handlungsanweisungen für die Aufdeckung und Behandlung von Voreingenommenheit bzw. Bias während des Entwicklungsprozesses als Lücke festgestellt.In dieser Arbeit wird eine Vielzahl von Verzerrungsarten bzw. Bias-Typen in der Literatur identifiziert, indem eine breite Literaturrecherche durchgeführt wird, klassifiziert nach der Entstehungsursache bzw. wie sie aufgedeckt werden können. Diese identifizierten Bias-Typen bilden die Grundlage für das Kernstück dieser Arbeit, die Erstellung eines systematischen Mappings der identifizierten Bias-Typen entsprechend der zugehörigen Aufgaben in CRISP-DM. Darüber hinaus wird dieses Mapping mit den obligatorischen Anforderungen aus dem European Union Articifial Intelligence Act (AI-Act) verglichen und mit dedizierten Schritten erweitert, um einen Ansatz zur Konformität mit der Regulierung vorzuschlagen.

License:

In Copyright

Appears in Collections:

Thesis