<div class="csl-bib-body">
<div class="csl-entry">Priselac, S. (2022). <i>Outlier detection for mixed-attribute data</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2022.99623</div>
</div>
-
dc.identifier.uri
https://doi.org/10.34726/hss.2022.99623
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/20317
-
dc.description.abstract
Outlier detection is a data mining technique for identifying a typical observations in data, which are called outliers or anomalies. Applications of outlier detection include removing noise from data, leading to more accurate machine learning models, and identifying interesting observations that may arise from various data generation mechanisms. Despite the fact that various data contain both continuous and categorical attributes, outlier detection techniques for such mixed-attribute data have not been widely used in practice thus far. This thesis examines a selection of available outlier detection techniques for mixed- attribute data and their respective properties in terms of effectiveness and efficiency. The analysis is limited to unsupervised scoring techniques where the true status of the observations is unknown and the output of the method provides scores rather than just a binary label. The review of scientific literature resulted in eight methods selected for analysis, designated by the acronyms POD, ABOD, FAMDAD, SECODA, ZDisc, KMeans, PCAmix, and MIX. Their properties are acquired based on extensive simulation experiments and evaluation with real data sets. The performance of the methods for different data structures is investigated by observing the effects of the outlier proportion, severeness and type, the correlation between attributes, and the different data sizes. The analysis of the respective outlier detection methods shows that examining outlyingness for mixed-attribute data appears more complex as opposed to homogeneous data types and thus also requires increased consideration. The methods perform differently when the observations are outlying only in either continuous or categorical attribute spaces, or the entire attribute space. In addition, the efficiency of the methods is strongly influenced by the proportions of mixed attributes and their total number.
en
dc.language
English
-
dc.language.iso
en
-
dc.rights.uri
http://rightsstatements.org/vocab/InC/1.0/
-
dc.subject
Robustheit
de
dc.subject
Clustering
de
dc.subject
Ausreißer
de
dc.subject
Robustness
en
dc.subject
Clustering
en
dc.subject
Outliers
en
dc.title
Outlier detection for mixed-attribute data
en
dc.type
Thesis
en
dc.type
Hochschulschrift
de
dc.rights.license
In Copyright
en
dc.rights.license
Urheberrechtsschutz
de
dc.identifier.doi
10.34726/hss.2022.99623
-
dc.contributor.affiliation
TU Wien, Österreich
-
dc.rights.holder
Sanja Priselac
-
dc.publisher.place
Wien
-
tuw.version
vor
-
tuw.thesisinformation
Technische Universität Wien
-
tuw.publication.orgunit
E180 - Fakultät für Informatik
-
dc.type.qualificationlevel
Diploma
-
dc.identifier.libraryid
AC16538959
-
dc.description.numberOfPages
70
-
dc.thesistype
Diplomarbeit
de
dc.thesistype
Diploma Thesis
en
dc.rights.identifier
In Copyright
en
dc.rights.identifier
Urheberrechtsschutz
de
tuw.advisor.staffStatus
staff
-
tuw.advisor.orcid
0000-0002-8014-4682
-
item.languageiso639-1
en
-
item.fulltext
with Fulltext
-
item.openaccessfulltext
Open Access
-
item.mimetype
application/pdf
-
item.openairetype
master thesis
-
item.grantfulltext
open
-
item.openairecristype
http://purl.org/coar/resource_type/c_bdcc
-
item.cerifentitytype
Publications
-
crisitem.author.dept
E105 - Institut für Stochastik und Wirtschaftsmathematik