<div class="csl-bib-body">
<div class="csl-entry">Martinez Duarte, D. (2024). <i>Federated generation of synthetic tabular data</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2024.112561</div>
</div>
-
dc.identifier.uri
https://doi.org/10.34726/hss.2024.112561
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/202727
-
dc.description.abstract
Machine learning (ML) models have been demonstrated to be beneficial in various domains. However, their application remains severely limited due to concerns about (1) using personal data for training ML models and (2) exchanging data between different organizations, like hospitals and banks. Both cases might lead to privacy breaches and disclosure of sensitive information. In this work, we tackle both problems simultaneously by generating synthetic data in a federated learning manner. Previous work in this field primarily addresses image data generation, while we focus on tabular data, which is more relevant for sensitive data domains.In particular, we proposed adapting two centralized tabular data generation methods, Bayesian Networks and Variational Autoencoders, to the federated setting with a novel aggregation approach applied specifically to Bayesian Networks. We perform an exhaustive evaluation of the generated synthetic on three datasets in terms of fidelity, utility, and privacy. Further, we demonstrate how the data performance changes depending on data partition among clients participating in federated learning and how the number of clients impacts the results. Our results suggest that, in many cases, the proposed methods in federated settings perform similarly to those in centralized settings and outperform local data generation. However, the imbalance among clients significantly affects the synthetic data generated by Variational Autoencoders.
en
dc.description.abstract
Machine learning (ML) models have been demonstrated to be beneficial in various domains. However, their application remains severely limited due to concerns about (1) using personal data for training ML models and (2) exchanging data between different organizations, like hospitals and banks. Both cases might lead to privacy breaches and disclosure of sensitive information. In this work, we tackle both problems simultaneously by generating synthetic data in a federated learning manner. Previous work in this field primarily addresses image data generation, while we focus on tabular data, which is more relevant for sensitive data domains.In particular, we proposed adapting two centralized tabular data generation methods, Bayesian Networks and Variational Autoencoders, to the federated setting with a novel aggregation approach applied specifically to Bayesian Networks. We perform an exhaustive evaluation of the generated synthetic on three datasets in terms of fidelity, utility, and privacy. Further, we demonstrate how the data performance changes depending on data partition among clients participating in federated learning and how the number of clients impacts the results. Our results suggest that, in many cases, the proposed methods in federated settings perform similarly to those in centralized settings and outperform local data generation. However, the imbalance among clients significantly affects the synthetic data generated by Variational Autoencoders.
de
dc.language
English
-
dc.language.iso
en
-
dc.rights.uri
http://rightsstatements.org/vocab/InC/1.0/
-
dc.subject
Privacy-preserving Techniques
en
dc.subject
Synthetic Data
en
dc.subject
Federated Learning
en
dc.subject
Bayesian Networks
en
dc.subject
Variational Autoencoders
en
dc.title
Federated generation of synthetic tabular data
en
dc.type
Thesis
en
dc.type
Hochschulschrift
de
dc.rights.license
In Copyright
en
dc.rights.license
Urheberrechtsschutz
de
dc.identifier.doi
10.34726/hss.2024.112561
-
dc.contributor.affiliation
TU Wien, Österreich
-
dc.rights.holder
Daniela Martinez Duarte
-
dc.publisher.place
Wien
-
tuw.version
vor
-
tuw.thesisinformation
Technische Universität Wien
-
dc.contributor.assistant
Mayer, Rudolf
-
tuw.publication.orgunit
E194 - Institut für Information Systems Engineering
-
dc.type.qualificationlevel
Diploma
-
dc.identifier.libraryid
AC17336983
-
dc.description.numberOfPages
117
-
dc.thesistype
Diplomarbeit
de
dc.thesistype
Diploma Thesis
en
dc.rights.identifier
In Copyright
en
dc.rights.identifier
Urheberrechtsschutz
de
tuw.advisor.staffStatus
staff
-
tuw.assistant.staffStatus
staff
-
tuw.assistant.orcid
0000-0003-0424-5999
-
item.languageiso639-1
en
-
item.openairetype
master thesis
-
item.openairecristype
http://purl.org/coar/resource_type/c_bdcc
-
item.grantfulltext
open
-
item.cerifentitytype
Publications
-
item.fulltext
with Fulltext
-
item.mimetype
application/pdf
-
item.openaccessfulltext
Open Access
-
crisitem.author.dept
E194-04 - Forschungsbereich Data Science
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering