<div class="csl-bib-body">
<div class="csl-entry">Lasy, I., Knees, P., & Woltran, S. (2025). Understanding Verbatim Memorization in LLMs Through Circuit Discovery. In R. Jia, E. Wallace, Y. Huang, T. Pimentel, P. Maini, V. Dankers, J. T.-Z. Wei, & P. Lesci (Eds.), <i>Proceedings of the First Workshop on Large Language Model Memorization (L2M2)</i> (pp. 83–94). Association for Computational Linguistics. https://doi.org/10.18653/v1/2025.l2m2-1.7</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/223679
-
dc.description.abstract
Underlying mechanisms of memorization in LLMs—the verbatim reproduction of training data—remain poorly understood. What exact part of the network decides to retrieve a token that we would consider as start of memorization sequence? How exactly is the models’ behaviour different when producing memorized sentence vs non-memorized? In this work we approach these questions from mechanistic interpretability standpoint by utilizing transformer circuits—the minimal computational subgraphs that perform specific functions within the model. Through carefully constructed contrastive datasets, we identify points where model generation diverges from memorized content and isolate the specific circuits responsible for two distinct aspects of memorization. We find that circuits that initiate memorization can also maintain it once started, while circuits that only maintain memorization cannot trigger its initiation. Intriguingly, memorization prevention mechanisms transfer robustly across different text domains, while memorization induction appears more context-dependent.
en
dc.description.sponsorship
FWF - Österr. Wissenschaftsfonds
-
dc.language.iso
en
-
dc.subject
Large Language Models (LLMs)
en
dc.subject
Memorization
en
dc.subject
Circuit Discovery
en
dc.title
Understanding Verbatim Memorization in LLMs Through Circuit Discovery
en
dc.type
Inproceedings
en
dc.type
Konferenzbeitrag
de
dc.contributor.editoraffiliation
Computer Science - University of Southern California (Los Angeles, US)
-
dc.contributor.editoraffiliation
OpenAI
-
dc.contributor.editoraffiliation
Google DeepMind
-
dc.contributor.editoraffiliation
ETH Zurich
-
dc.contributor.editoraffiliation
Carnegie Mellon University
-
dc.contributor.editoraffiliation
Thomas Lord Department of Computer Science - University of Southern California (Los Angeles, US)
-
dc.contributor.editoraffiliation
University of Cambridge
-
dc.relation.isbn
979-8-89176-278-7
-
dc.relation.doi
10.18653/v1/2025.l2m2-1
-
dc.description.startpage
83
-
dc.description.endpage
94
-
dc.relation.grantno
COE 12
-
dc.type.category
Full-Paper Contribution
-
tuw.booktitle
Proceedings of the First Workshop on Large Language Model Memorization (L2M2)
-
tuw.peerreviewed
true
-
tuw.relation.publisher
Association for Computational Linguistics
-
tuw.project.title
Bilateral Artificial Intelligence
-
tuw.researchinfrastructure
Vienna Scientific Cluster
-
tuw.researchTopic.id
I1
-
tuw.researchTopic.id
I4
-
tuw.researchTopic.name
Logic and Computation
-
tuw.researchTopic.name
Information Systems Engineering
-
tuw.researchTopic.value
50
-
tuw.researchTopic.value
50
-
tuw.publication.orgunit
E056-23 - Fachbereich Innovative Combinations and Applications of AI and ML (iCAIML)
-
tuw.publication.orgunit
E194-04 - Forschungsbereich Data Science
-
tuw.publication.orgunit
E056-27 - Fachbereich Digital Humanism
-
tuw.publisher.doi
10.18653/v1/2025.l2m2-1.7
-
dc.description.numberOfPages
12
-
tuw.author.orcid
0000-0003-3906-1292
-
tuw.author.orcid
0000-0003-1594-8972
-
tuw.editor.orcid
0009-0002-8123-7132
-
tuw.editor.orcid
0000-0001-5955-6896
-
tuw.editor.orcid
0009-0001-9406-419X
-
tuw.event.name
The First Workshop on Large Language Model Memorization
en
tuw.event.startdate
01-08-2025
-
tuw.event.enddate
01-08-2025
-
tuw.event.online
On Site
-
tuw.event.type
Event for scientific audience
-
tuw.event.place
Vienna
-
tuw.event.country
AT
-
tuw.event.presenter
Lasy, Ilya
-
tuw.event.track
Single Track
-
wb.sciencebranch
Informatik
-
wb.sciencebranch.oefos
1020
-
wb.sciencebranch.value
100
-
item.openairetype
conference paper
-
item.openairecristype
http://purl.org/coar/resource_type/c_5794
-
item.cerifentitytype
Publications
-
item.languageiso639-1
en
-
item.grantfulltext
none
-
item.fulltext
no Fulltext
-
crisitem.author.dept
E192-02 - Forschungsbereich Databases and Artificial Intelligence
-
crisitem.author.dept
E194 - Institut für Information Systems Engineering
-
crisitem.author.dept
E192-02 - Forschungsbereich Databases and Artificial Intelligence