Evaluation of robust outlier detection methods for zero-inflated complex data

Templ, M.; Gussenbauer, J.; Filzmoser, P.

doi:10.1080/02664763.2019.1671961

DC Field

Value

Language

dc.contributor.author

Templ, M.

dc.contributor.author

Gussenbauer, J.

dc.contributor.author

Filzmoser, P.

dc.date.accessioned

2022-06-01T13:06:44Z

dc.date.available

2022-06-01T13:06:44Z

dc.date.issued

2020

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Templ, M., Gussenbauer, J., & Filzmoser, P. (2020). Evaluation of robust outlier detection methods for zero-inflated complex data. <i>Journal of Applied Statistics</i>, <i>47</i>(7), 1144–1167. https://doi.org/10.1080/02664763.2019.1671961</div> </div>

dc.identifier.issn

0266-4763

dc.identifier.uri

http://hdl.handle.net/20.500.12708/20292

dc.description.abstract

Outlier detection can be seen as a pre-processing step for locating data points in a data sample, which do not conform to the majority of observations. Various techniques and methods for outlier detection can be found in the literature dealing with different types of data. However, many data sets are inflated by true zeros and, in addition, some components/variables might be of compositional nature. Important examples of such data sets are the Structural Earnings Survey, the Structural Business Statistics, the European Statistics on Income and Living Conditions, tax data or–as in this contribution–household expenditure data which are used, for example, to estimate the Purchase Power Parity of a country. In this work, robust univariate and multivariate outlier detection methods are compared by a complex simulation study that considers various challenges included in data sets, namely structural (true) zeros, missing values, and compositional variables. These circumstances make it difficult or impossible to flag true outliers and influential observations by well-known outlier detection methods. Our aim is to assess the performance of outlier detection methods in terms of their effectiveness to identify outliers when applied to challenging data sets such as the household expenditures data surveyed all over the world. Moreover, different methods are evaluated through a close-to-reality simulation study. Differences in performance of univariate and multivariate robust techniques for outlier detection and their shortcomings are reported. We found that robust multivariate methods outperform robust univariate methods. The best performing methods in finding the outliers and in providing a low false discovery rate were found to be the generalized S estimators (GSE), the BACON-EEM algorithm and a compositional method (CoDa-Cov). In addition, these methods performed also best when the outliers are imputed based on the corresponding outlier detection method and indicators are estimated from the data sets.

dc.language.iso

dc.publisher

TAYLOR & FRANCIS LTD

dc.relation.ispartof

Journal of Applied Statistics

dc.rights.uri

http://creativecommons.org/licenses/by-nc-nd/4.0/

dc.subject

household expenditures

dc.subject

outlier detection

dc.subject

robust methods

dc.subject

zeros

dc.title

Evaluation of robust outlier detection methods for zero-inflated complex data

dc.type

Article

dc.type

Artikel

dc.rights.license

Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

dc.rights.license

Creative Commons Namensnennung - Nicht kommerziell - Keine Bearbeitungen 4.0 International

dc.identifier.scopus

2-s2.0-85073935729

dc.identifier.url

https://api.elsevier.com/content/abstract/scopus_id/85073935729

dc.contributor.affiliation

Statistics Austria, Austria

dc.description.startpage

1144

dc.description.endpage

1167

dc.rights.holder

dc.type.category

Original Research Article

tuw.container.volume

tuw.container.issue

tuw.journal.peerreviewed

true

tuw.peerreviewed

true

dcterms.isPartOf.title

Journal of Applied Statistics

tuw.publication.orgunit

E105 - Institut für Stochastik und Wirtschaftsmathematik

tuw.publisher.doi

10.1080/02664763.2019.1671961

dc.date.onlinefirst

2019-09-27

dc.identifier.eissn

1360-0532

dc.identifier.libraryid

AC17204823

dc.description.numberOfPages

tuw.author.orcid

0000-0002-8014-4682

dc.rights.identifier

CC BY-NC-ND 4.0

dc.rights.identifier

CC BY-NC-ND 4.0

wb.sci

true

item.openairecristype

http://purl.org/coar/resource_type/c_2df8fbb1

item.mimetype

application/pdf

item.fulltext

with Fulltext

item.cerifentitytype

Publications

item.languageiso639-1

item.openairetype

research article

item.grantfulltext

open

item.openaccessfulltext

Open Access

crisitem.author.dept

E105 - Institut für Stochastik und Wirtschaftsmathematik

crisitem.author.dept

TU Wien

crisitem.author.dept

E105-06 - Forschungsbereich Computational Statistics

crisitem.author.orcid

0000-0002-8014-4682

crisitem.author.parentorg

E100 - Fakultät für Mathematik und Geoinformation

crisitem.author.parentorg

E105 - Institut für Stochastik und Wirtschaftsmathematik

Appears in Collections:

Article

Fulltext (Version of Record (published version))

Adobe PDF

(2.57 MB)

CC BY-NC-ND 4.0

Show simple item record

Page view(s)

318

checked on Dec 1, 2023

Download(s)

checked on Dec 1, 2023

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM