Optimizing Agent Planning for Security and Autonomy

Kolluri, Aashish; Sharma, Rishi; Costa, Manuel; Köpf, Boris; Nießen, Tobias; Russinovich, Mark; Tople, Shruti; Zanella Béguelin, Santiago

DC Field

Value

Language

dc.contributor.author

Kolluri, Aashish

dc.contributor.author

Sharma, Rishi

dc.contributor.author

Costa, Manuel

dc.contributor.author

Köpf, Boris

dc.contributor.author

Nießen, Tobias

dc.contributor.author

Russinovich, Mark

dc.contributor.author

Tople, Shruti

dc.contributor.author

Zanella Béguelin, Santiago

dc.date.accessioned

2026-05-28T08:29:17Z

dc.date.available

2026-05-28T08:29:17Z

dc.date.issued

2026

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Kolluri, A., Sharma, R., Costa, M., Köpf, B., Nießen, T., Russinovich, M., Tople, S., & Zanella Béguelin, S. (2026). Optimizing Agent Planning for Security and Autonomy. In <i>The Fourteenth International Conference on Learning Representations</i>. The Fourteenth International Conference on Learning Representations, Rio de Janeiro, Brazil.</div> </div>

dc.identifier.uri

http://hdl.handle.net/20.500.12708/228304

dc.description.abstract

Indirect prompt injection attacks threaten AI agents that execute consequential actions, motivating deterministic system-level defenses. Such defenses can provably block unsafe actions by enforcing confidentiality and integrity policies, but currently appear costly: they reduce task completion rates and increase token usage compared to probabilistic defenses. We argue that existing evaluations miss a key benefit of system-level defenses: reduced reliance on human oversight. We introduce autonomy metrics to quantify this benefit: the fraction of consequential actions an agent can execute without human-in-the-loop (HITL) approval while preserving security. To increase autonomy, we design a security-aware agent that (i) introduces richer HITL interactions, and (ii) explicitly plans for both task progress and policy compliance. We implement this agent design atop an existing information-flow control defense against prompt injection and evaluate it on the AgentDojo and WASP benchmarks. Experiments show that this approach yields higher autonomy without sacrificing utility (task completion).

dc.description.sponsorship

European Commission

dc.language.iso

dc.subject

AI Agents

dc.subject

Security

dc.subject

Prompt Injection Attacks

dc.subject

Information Flow Control

dc.subject

Autonomy

dc.title

Optimizing Agent Planning for Security and Autonomy

dc.type

Inproceedings

dc.type

Konferenzbeitrag

dc.relation.grantno

101034440

dc.type.category

Full-Paper Contribution

tuw.booktitle

The Fourteenth International Conference on Learning Representations

tuw.peerreviewed

true

tuw.project.title

Logics for Computer Science Program at TU Wien

tuw.researchTopic.id

tuw.researchTopic.name

Logic and Computation

tuw.researchTopic.name

Information Systems Engineering

tuw.researchTopic.value

tuw.publication.orgunit

E192-04 - Forschungsbereich Formal Methods in Systems Engineering

tuw.publication.orgunit

E056-13 - Fachbereich LogiCS

dc.description.numberOfPages

tuw.author.orcid

0000-0003-1792-4448

tuw.author.orcid

0000-0002-1928-1549

tuw.author.orcid

0009-0005-8004-0743

tuw.author.orcid

0000-0002-7712-0006

tuw.author.orcid

0009-0009-8306-0933

tuw.author.orcid

0000-0003-0479-9967

tuw.event.name

The Fourteenth International Conference on Learning Representations

tuw.event.startdate

23-04-2026

tuw.event.enddate

27-04-2026

tuw.event.online

On Site

tuw.event.type

Event for scientific audience

tuw.event.place

Rio de Janeiro

tuw.event.country

tuw.event.presenter

Sharma, Rishi

wb.sciencebranch

Informatik

wb.sciencebranch

Mathematik

wb.sciencebranch.oefos

1020

wb.sciencebranch.oefos

1010

wb.sciencebranch.value

item.languageiso639-1

item.openairetype

conference paper

item.grantfulltext

none

item.openairecristype

http://purl.org/coar/resource_type/c_5794

item.cerifentitytype

Publications

item.fulltext

no Fulltext

crisitem.author.dept

E192-04 - Forschungsbereich Formal Methods in Systems Engineering

crisitem.author.orcid

0000-0003-1792-4448

crisitem.author.orcid

0009-0004-2905-9011

crisitem.author.orcid

0009-0005-8004-0743

crisitem.author.orcid

0000-0002-7712-0006

crisitem.author.orcid

0009-0009-8306-0933

crisitem.author.orcid

0000-0002-7733-0548

crisitem.author.orcid

0000-0003-0479-9967

crisitem.author.parentorg

E192 - Institut für Logic and Computation

crisitem.project.funder

European Commission

crisitem.project.grantno

101034440

Appears in Collections:

Conference Paper

Show simple item record

Page view(s)

checked on May 28, 2026

Google Scholar^TM

Check

Page view(s)

Google ScholarTM

Google Scholar^TM