TTT-KD: Test-Time Training for 3D Semantic Segmentation Through Knowledge Distillation From Foundation Models

Weijler, Lisa Magdalena; Mirza, Jehanzeb M.; Sick, Leon; Ekkazan, Can; Hermosilla, Pedro

doi:10.1109/3DV66043.2025.00120

Record link:

http://hdl.handle.net/20.500.12708/221542

Title:

TTT-KD: Test-Time Training for 3D Semantic Segmentation Through Knowledge Distillation From Foundation Models

Citation:

Weijler, L. M., Mirza, J. M., Sick, L., Ekkazan, C., & Hermosilla, P. (2025). TTT-KD: Test-Time Training for 3D Semantic Segmentation Through Knowledge Distillation From Foundation Models. In 2025 International Conference on 3D Vision (3DV) (pp. 1264–1274). IEEE. https://doi.org/10.1109/3DV66043.2025.00120

Publisher DOI:

10.1109/3DV66043.2025.00120

Publication Type:

Inproceedings - Full-Paper Contribution

Language:

English

Authors:

Weijler, Lisa Magdalena
Mirza, Jehanzeb M.
Sick, Leon
Ekkazan, Can
Hermosilla, Pedro

Organisational Unit:

E193-01 - Forschungsbereich Computer Vision

Published in:

2025 International Conference on 3D Vision (3DV)

ISBN:

979-8-3315-3851-4

DOI of the book:

10.1109/3DV66043.2025

Date (published):

25-Aug-2025

Event name:

2025 International Conference on 3D Vision (3DV)

Event date:

25-Mar-2025 - 28-Mar-2025

Event place:

Singapore

Number of Pages:

Publisher:

IEEE

Peer reviewed:

Yes

Keywords:

domain adaptation; point clouds; semantic segmentation; test-time training

Abstract:

Test-Time Training (TTT) proposes to adapt a pretrained network to changing data distributions on-the-fly. In this work, we propose the first TTT method for 3D semantic segmentation, TTT-KD, which models Knowledge Distillation (KD) from foundation models (e.g. DINOv2) as a self-supervised objective for adaptation to distribution shifts at test-time. Given access to paired image-pointcloud (2D-3D) data, we first optimize a 3D segmentation backbone for the main task of semantic segmentation using the pointclouds and the task of 2D → 3D KD by using an offthe-shelf 2D pre-trained foundation model. At test-time, our TTT-KD updates the 3D segmentation backbone for each test sample by using the self-supervised task of knowledge distillation before performing the final prediction. Extensive evaluations on multiple indoor and outdoor 3D segmentation benchmarks show the utility of TTT-KD, as it improves performance for both in-distribution (ID) and outof-distribution (OOD) test datasets. We achieve a gain of up to 13 % mIoU (7 % on average) when the train and test distributions are similar and up to 45 % (20 % on average) when adapting to OOD test samples. The code is available in the following repository.

Research Areas:

Visual Computing and Human-Centered Technology: 100%

Science Branch:

1020 - Informatik: 90%
1010 - Mathematik: 10%

Appears in Collections:

Conference Paper

Show full item record

Google Scholar^TM

Check

Google ScholarTM

Google Scholar^TM