Lazy Testing of Machine-Learning Models

Isychev, Anastasia; Wüstholz, Valentin; Christakis, Maria

doi:10.24963/ijcai.2025/826

Record link:

http://hdl.handle.net/20.500.12708/219886

Title:

Citation:

Isychev, A., Wüstholz, V., & Christakis, M. (2025). Lazy Testing of Machine-Learning Models. In Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence (pp. 7428–7436). https://doi.org/10.24963/ijcai.2025/826

Publisher DOI:

10.24963/ijcai.2025/826

Publication Type:

Inproceedings - Full-Paper Contribution

Language:

English

Authors:

Isychev, Anastasia
Wüstholz, Valentin
Christakis, Maria

Organisational Unit:

E194-01 - Forschungsbereich Software Engineering

Published in:

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence

ISBN:

978-1-956792-06-5

Date (published):

2025

Event name:

Thirty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2025))

Event date:

16-Aug-2025 - 22-Aug-2025

Event place:

Montreal, Canada

Number of Pages:

Peer reviewed:

Yes

Keywords:

machine learning; testing; static analysis

Abstract:

Checking the reliability of machine-learning models is a crucial, but challenging task. Nomos is an existing, automated framework for testing general, user-provided functional properties of models, including so-called hyperproperties expressed over more than one model execution. Nomos aims to find model inputs that expose ``bugs'', that is, property violations. However, performing thousands of model invocations during testing is costly both in terms of time and money (for metered APIs, such as OpenAI's). We present LaZ (pronounced ``lazy''), an extension of Nomos that automatically minimizes the number of model invocations to boost the test throughput and thereby find bugs more efficiently. During test execution, LaZ automatically identifies redundant invocations---invocations where the model output does not affect the final test outcome---and skips them, much like lazy evaluation in certain programming languages. This optimization enables a second one that dynamically reorders model invocations to skip the more expensive ones. As a result, LaZ finds the same number of bugs as Nomos, but does so median 33% and up to 60% faster.

Project title:

Structured Doctoral Program on Automated Reasoning: DOC1345324 (FWF - Österr. Wissenschaftsfonds)

Research Areas:

Information Systems Engineering: 100%

Science Branch:

1020 - Informatik: 100%

Appears in Collections:

Conference Paper

Show full item record

Page view(s)

checked on Oct 13, 2025

Google Scholar^TM

Check

Page view(s)

Google ScholarTM

Google Scholar^TM