Outlier detection for mixed-attribute data

Priselac, Sanja

doi:10.34726/hss.2022.99623

DC Field

Value

Language

dc.contributor.advisor

Filzmoser, Peter

dc.contributor.author

Priselac, Sanja

dc.date.accessioned

2022-06-03T08:49:43Z

dc.date.issued

2022

dc.date.submitted

2022-05

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Priselac, S. (2022). <i>Outlier detection for mixed-attribute data</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2022.99623</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2022.99623

dc.identifier.uri

http://hdl.handle.net/20.500.12708/20317

dc.description.abstract

Outlier detection is a data mining technique for identifying a typical observations in data, which are called outliers or anomalies. Applications of outlier detection include removing noise from data, leading to more accurate machine learning models, and identifying interesting observations that may arise from various data generation mechanisms. Despite the fact that various data contain both continuous and categorical attributes, outlier detection techniques for such mixed-attribute data have not been widely used in practice thus far. This thesis examines a selection of available outlier detection techniques for mixed- attribute data and their respective properties in terms of effectiveness and efficiency. The analysis is limited to unsupervised scoring techniques where the true status of the observations is unknown and the output of the method provides scores rather than just a binary label. The review of scientific literature resulted in eight methods selected for analysis, designated by the acronyms POD, ABOD, FAMDAD, SECODA, ZDisc, KMeans, PCAmix, and MIX. Their properties are acquired based on extensive simulation experiments and evaluation with real data sets. The performance of the methods for different data structures is investigated by observing the effects of the outlier proportion, severeness and type, the correlation between attributes, and the different data sizes. The analysis of the respective outlier detection methods shows that examining outlyingness for mixed-attribute data appears more complex as opposed to homogeneous data types and thus also requires increased consideration. The methods perform differently when the observations are outlying only in either continuous or categorical attribute spaces, or the entire attribute space. In addition, the efficiency of the methods is strongly influenced by the proportions of mixed attributes and their total number.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

Robustheit

dc.subject

Clustering

dc.subject

Ausreißer

dc.subject

Robustness

dc.subject

Clustering

dc.subject

Outliers

dc.title

Outlier detection for mixed-attribute data

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2022.99623

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Sanja Priselac

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

tuw.publication.orgunit

E180 - Fakultät für Informatik

dc.type.qualificationlevel

Diploma

dc.identifier.libraryid

AC16538959

dc.description.numberOfPages

dc.thesistype

Diplomarbeit

dc.thesistype

Diploma Thesis

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.advisor.orcid

0000-0002-8014-4682

item.languageiso639-1

item.openairetype

master thesis

item.grantfulltext

open

item.fulltext

with Fulltext

item.cerifentitytype

Publications

item.mimetype

application/pdf

item.openairecristype

http://purl.org/coar/resource_type/c_bdcc

item.openaccessfulltext

Open Access

crisitem.author.dept

E105 - Institut für Stochastik und Wirtschaftsmathematik

crisitem.author.parentorg

E100 - Fakultät für Mathematik und Geoinformation

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(1.32 MB)

In Copyright

Show simple item record

Page view(s)

359

checked on Nov 21, 2023

Download(s)

161

checked on Nov 21, 2023

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM