<div class="csl-bib-body">
<div class="csl-entry">Phan, T.-L., Klaus Weinbauer, Thomas Gärtner, Merkle, D., Andersen, J., Fagerberg, R., & Stadler, P. F. (2024). Reaction rebalancing: a novel approach to curating reaction databases. <i>Journal of Cheminformatics</i>, <i>16</i>(1), Article 82. https://doi.org/10.1186/s13321-024-00875-4</div>
</div>
-
dc.identifier.issn
1758-2946
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/204922
-
dc.description.abstract
Purpose. Reaction databases are a key resource for a wide variety of applications in computational chemistry and biochemistry, including Computer-aided Synthesis Planning (CASP) and the large-scale analysis of metabolic networks. The full potential of these resources can only be realized if datasets are accurate and complete. Missing co-reactants and co-products, i.e., unbalanced reactions, however, are the rule rather than the exception. The curation and correction of such incomplete entries is thus an urgent need.
Methods. The SynRBL framework addresses this issue with a dual-strategy: a rule-based method for non-carbon compounds, using atomic symbols and counts for prediction, alongside a Maximum Common Subgraph (MCS)-based technique for carbon compounds, aimed at aligning reactants and products to infer missing entities.
Results. The rule-based method exceeded 99% accuracy, while MCS-based accuracy varied from 81.19 to 99.33%, depending on reaction properties. Furthermore, an applicability domain and a machine learning scoring function were devised to quantify prediction confidence. The overall efficacy of this framework was delineated through its success rate and accuracy metrics, which spanned from 89.83 to 99.75% and 90.85 to 99.05%, respectively.
Conclusion. The SynRBL framework offers a novel solution for recalibrating chemical reactions, significantly enhancing reaction completeness. With rigorous validation, it achieved groundbreaking accuracy in reaction rebalancing. This sets the stage for future improvement in particular of atom-atom mapping techniques as well as of downstream tasks such as automated synthesis planning.
Scientific Contribution. SynRBL features a novel computational approach to correcting unbalanced entries in chemical reaction databases. By combining heuristic rules for inferring non-carbon compounds and common subgraph searches to address carbon unbalance, SynRBL successfully addresses most instances of this problem, which affects the majority of data in most large-scale resources. Compared to alternative solutions, SynRBL achieves a dramatic increase in both success rate and accurary, and provides the first freely available open source solution for this problem.
en
dc.description.sponsorship
European Commission
-
dc.description.sponsorship
WWTF Wiener Wissenschafts-, Forschu und Technologiefonds
-
dc.language.iso
en
-
dc.publisher
BMC
-
dc.relation.ispartof
Journal of Cheminformatics
-
dc.subject
Data curation
en
dc.subject
Maximum-common-subgraph
en
dc.subject
Reaction databases
en
dc.subject
Rules
en
dc.subject
SynRBL
en
dc.subject
Unbalanced reactions
en
dc.title
Reaction rebalancing: a novel approach to curating reaction databases
en
dc.type
Article
en
dc.type
Artikel
de
dc.identifier.scopus
2-s2.0-85199075955
-
dc.identifier.url
http://dx.doi.org/10.1186/s13321-024-00875-4
-
dc.contributor.affiliation
University of Southern Denmark, Denmark
-
dc.contributor.affiliation
University of Southern Denmark, Denmark
-
dc.contributor.affiliation
Max Planck Institute for Mathematics in the Sciences, Germany
-
dc.relation.grantno
Proposal number: 101072930
-
dc.relation.grantno
ICT22-059
-
dc.type.category
Original Research Article
-
tuw.container.volume
16
-
tuw.container.issue
1
-
tuw.journal.peerreviewed
true
-
tuw.peerreviewed
true
-
wb.publication.intCoWork
International Co-publication
-
tuw.project.title
Training Alliance for Computational Systems chemistry
-
tuw.project.title
Structured Data Learning with Generalized Similarities
-
tuw.researchTopic.id
I1
-
tuw.researchTopic.id
I4
-
tuw.researchTopic.name
Logic and Computation
-
tuw.researchTopic.name
Information Systems Engineering
-
tuw.researchTopic.value
40
-
tuw.researchTopic.value
60
-
dcterms.isPartOf.title
Journal of Cheminformatics
-
tuw.publication.orgunit
E194-06 - Forschungsbereich Machine Learning
-
tuw.publisher.doi
10.1186/s13321-024-00875-4
-
dc.date.onlinefirst
2024
-
dc.identifier.articleid
82
-
dc.identifier.eissn
1758-2946
-
tuw.author.orcid
0000-0002-3532-2064
-
tuw.author.orcid
0000-0001-5985-9213
-
tuw.author.orcid
0000-0001-7792-375X
-
tuw.author.orcid
0000-0002-4165-3732
-
tuw.author.orcid
0000-0003-1004-3314
-
tuw.author.orcid
0000-0002-5016-5191
-
wb.sci
true
-
wb.sciencebranch
Chemie
-
wb.sciencebranch
Informatik
-
wb.sciencebranch
Mathematik
-
wb.sciencebranch.oefos
1040
-
wb.sciencebranch.oefos
1020
-
wb.sciencebranch.oefos
1010
-
wb.sciencebranch.value
30
-
wb.sciencebranch.value
60
-
wb.sciencebranch.value
10
-
item.grantfulltext
none
-
item.languageiso639-1
en
-
item.openairetype
research article
-
item.cerifentitytype
Publications
-
item.fulltext
no Fulltext
-
item.openairecristype
http://purl.org/coar/resource_type/c_2df8fbb1
-
crisitem.project.funder
European Commission
-
crisitem.project.funder
WWTF Wiener Wissenschafts-, Forschu und Technologiefonds
-
crisitem.project.grantno
Proposal number: 101072930
-
crisitem.project.grantno
ICT22-059
-
crisitem.author.dept
University of Southern Denmark
-
crisitem.author.dept
E194-06 - Forschungsbereich Machine Learning
-
crisitem.author.dept
E194-06 - Forschungsbereich Machine Learning
-
crisitem.author.dept
University of Southern Denmark
-
crisitem.author.dept
Max Planck Institute for Mathematics in the Sciences
-
crisitem.author.orcid
0000-0002-3532-2064
-
crisitem.author.orcid
0000-0002-3349-9157
-
crisitem.author.orcid
0000-0001-5985-9213
-
crisitem.author.orcid
0000-0001-7792-375X
-
crisitem.author.orcid
0000-0002-4165-3732
-
crisitem.author.orcid
0000-0003-1004-3314
-
crisitem.author.orcid
0000-0002-5016-5191
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering