Notice
This item was automatically migrated from a legacy system. It's data has not been checked and might not meet the quality criteria of the present system.
E105 - Institut für Stochastik und Wirtschaftsmathematik
-
Date (published):
2021
-
Event name:
BernR User Group
-
Event date:
11-Nov-2021
-
Event place:
Bern, Switzerland
-
Keywords:
Synthetic Data; Data Anonymization; SDC
-
Abstract:
New technologies and research in the field of machine learning and deep learning methods and new ways of accessing, integrating and analyzing sensitive personal data increase the demand for solutions to be able to respect laws on data privacy and confidentiality. Fields of applications include official statistics and social sciences, financial transactions, social network activities, location traj...
New technologies and research in the field of machine learning and deep learning methods and new ways of accessing, integrating and analyzing sensitive personal data increase the demand for solutions to be able to respect laws on data privacy and confidentiality. Fields of applications include official statistics and social sciences, financial transactions, social network activities, location trajectories, CRM, insurance data and medical records.
New data protection regulations, that especially include high penalties for violating privacy, put the topic of statistical disclosure control in focus.
There are a lot of different concepts to consider data privacy. To name a few: privacy preserving computation, remote execution, remote access, synthetic data, and statistical disclosure control. We focus on the latter one. Statistical disclosure control includes the measurement of the re-identification risk of persons in a data set, the anonymization of data and the measurement of the information loss after anonymization. After anonymization the data include no link to persons, and thus all the rules on privacy do no longer apply.
Based on an application using the European Survey of Income and Living Condition, we show the S4 class R package sdcMicro in action. Note, that each anonymization should be done very data- and use-case specific, and an anonymization of different kind of data (e.g. mobility data, event history data, time series, longitudinal data) needs different solutions.
The audience is expected to have knowledge in base R.
Further readings, resources and documents:
Publications (selection):
Publication: Journal of Statistical Software: sdcMicro
Publication about sdc app and a online test version
Book on SDC in Springer
Resources:
sdcMicro development on github
sdcMicro stable CRAN version
See also:
International Household Survey Network
World Bank Group