<div class="csl-bib-body">
<div class="csl-entry">Kapenekakis, A., Dell’Aglio, D., Bøgsted, M., Garofalakis, M., & Hose, K. (2025). Pasteur: Scaling Privacy-Aware Data Synthesis. In <i>Advances in Databases and Information Systems : 29th European Conference, ADBIS 2025, Tampere, Finland, September 23–26, 2025, Proceedings</i> (pp. 164–180). Springer Cham. https://doi.org/10.1007/978-3-032-05281-0_11</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/221036
-
dc.description.abstract
Privacy-aware data synthesis is a field aiming to liberate data access through the generation of synthetic data which mirrors the original without resulting in privacy exposure. State-of-the-art algorithms for structured data perform well in datasets with tables of a few million rows but result in prohibitive runtimes when scaling to hundreds of millions of rows. In addition, due to the sensitive nature of data, practitioners are often limited to a single server environment. In this paper, we present the framework Pasteur, which aims to scale privacy-aware data synthesis linearly under a single server environment. Pasteur achieves this through a parallelization approach tailored for synthesis, optimized memory representations, and an accelerated marginal calculation algorithm (bottleneck in a class of privacy-aware algorithms). We show Pasteur performing pre-processing, synthesis, and evaluation of a tabular dataset with 1 billion rows (200 GB) in 1 h on a 16 core CPU server.
en
dc.language.iso
en
-
dc.relation.ispartofseries
Lecture Notes in Computer Science
-
dc.subject
Privacy-aware data synthesis
en
dc.subject
synthetic data generation
en
dc.title
Pasteur: Scaling Privacy-Aware Data Synthesis
en
dc.type
Inproceedings
en
dc.type
Konferenzbeitrag
de
dc.contributor.affiliation
Aalborg University, Denmark
-
dc.contributor.affiliation
Aalborg University, Denmark
-
dc.contributor.affiliation
Aalborg University, Denmark
-
dc.contributor.affiliation
Athena Research and Innovation Center In Information Communication & Knowledge Technologies, Greece
-
dc.relation.isbn
978-3-032-05281-0
-
dc.relation.doi
10.1007/978-3-032-05281-0
-
dc.relation.issn
0302-9743
-
dc.description.startpage
164
-
dc.description.endpage
180
-
dc.type.category
Full-Paper Contribution
-
dc.relation.eissn
1611-3349
-
tuw.booktitle
Advances in Databases and Information Systems : 29th European Conference, ADBIS 2025, Tampere, Finland, September 23–26, 2025, Proceedings
-
tuw.container.volume
16043
-
tuw.peerreviewed
true
-
tuw.relation.publisher
Springer Cham
-
tuw.researchTopic.id
I1
-
tuw.researchTopic.id
I4
-
tuw.researchTopic.name
Logic and Computation
-
tuw.researchTopic.name
Information Systems Engineering
-
tuw.researchTopic.value
20
-
tuw.researchTopic.value
80
-
tuw.publication.orgunit
E192-02 - Forschungsbereich Databases and Artificial Intelligence
-
tuw.publication.orgunit
E056-23 - Fachbereich Innovative Combinations and Applications of AI and ML (iCAIML)
-
tuw.publisher.doi
10.1007/978-3-032-05281-0_11
-
dc.description.numberOfPages
17
-
tuw.author.orcid
0000-0003-0924-7811
-
tuw.author.orcid
0000-0003-4904-2511
-
tuw.author.orcid
0000-0001-9192-1814
-
tuw.author.orcid
0000-0003-0285-3907
-
tuw.author.orcid
0000-0001-7025-8099
-
tuw.event.name
29th European Conference on Advances in Databases and Information Systems (ADBIS 2025)
en
tuw.event.startdate
23-09-2025
-
tuw.event.enddate
26-09-2025
-
tuw.event.online
On Site
-
tuw.event.type
Event for scientific audience
-
tuw.event.place
Tampere
-
tuw.event.country
FI
-
tuw.event.presenter
Kapenekakis, Antheas
-
wb.sciencebranch
Informatik
-
wb.sciencebranch
Mathematik
-
wb.sciencebranch.oefos
1020
-
wb.sciencebranch.oefos
1010
-
wb.sciencebranch.value
80
-
wb.sciencebranch.value
20
-
item.openairecristype
http://purl.org/coar/resource_type/c_5794
-
item.cerifentitytype
Publications
-
item.openairetype
conference paper
-
item.fulltext
no Fulltext
-
item.languageiso639-1
en
-
item.grantfulltext
none
-
crisitem.author.dept
Aalborg University, Denmark
-
crisitem.author.dept
Aalborg University, Denmark
-
crisitem.author.dept
Aalborg University, Denmark
-
crisitem.author.dept
Athena Research and Innovation Center In Information Communication & Knowledge Technologies, Greece
-
crisitem.author.dept
E192-02 - Forschungsbereich Databases and Artificial Intelligence