Disclosure risk estimation for survey microdata

Totter, Marius

doi:10.34726/hss.2014.26125

Record link:

https://doi.org/10.34726/hss.2014.26125
http://hdl.handle.net/20.500.12708/5213

Title:

Disclosure risk estimation for survey microdata

Citation:

Totter, M. (2014). Disclosure risk estimation for survey microdata [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2014.26125

reposiTUm DOI:

10.34726/hss.2014.26125

CatalogPlus:

AC12232788

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Totter, Marius

Advisor:

Templ, Matthias

Organisational Unit:

E105 - Institut für Statistik und Wahrscheinlichkeitstheorie

Date (published):

2014

Number of Pages:

Keywords:

Statistical Disclosure Control; Microdata; Disclosure Risk; Simulation

Abstract:

Die vorliegende Diplomarbeit beschäftigt sich mit der Schätzung des Re-identifizierungsrisikos für Stichprobendaten. Es ist wichtig, dass veröffentlichte vertrauliche Daten ein sehr geringes Identifizierungsrisiko besitzen, um Gesetze und Richtlinien des Datenschutzes nicht zu verletzen. Das Ziel der Datenanonymisierung besteht aus der Minimierung des Informationsverlustes und der Maximierung der Datensicherheit. In dieser Arbeit werden verschiedene Anonymisierungsmethoden und das Re-identifizierungsrisiko vorgestellt. Das Hauptaugenmerk liegt in der Schätzung von zwei Risikomaßen mittels log-linearen Modellen. Anhand von Simulationen werden die log-linearen Modelle getestet, wobei die Stichproben unterschiedlichen Ziehungsmethoden unterliegen. Die wahren Risikomaße können mit dem geschätzten Risiko verglichen werden, da eine synthetische Population aus Testzwecken generiert wird, aus der die Stichproben gezogen werden. Alle log-linearen Modelle werden zusätzlich in einem Softwarepaket implementiert.

The estimation of the re-identification risk of individuals in survey microdata is in main focus of this master thesis. For released confidential data it is mandatory that individuals have very low risk of identification, otherwise laws on data privacy are violated. Many different anonymisation methods exist and their aim is both, to reduce the disclosure risk and to minimize information loss at the same time. The disclosure risk itself is described mathematically and the corresponding methods are implemented in software. One approach for estimating disclosure risk measures of categorical variables is based on log-linear models, which are used for modeling frequency counts. Knowing the truth by using synthetic population data and sampling from it, four log-linear models are tested on four different sampling designs and three different categorical variable scenarios in order to evaluate the performance of the methods. Within a simulation study the influence of different sampling designs on the disclosure risk methods is under consideration.

Additional information:

Abweichender Titel laut Übersetzung der Verfasserin/des Verfassers

License:

In Copyright

Appears in Collections:

Thesis