SGRec3D: Self-Supervised 3D Scene Graph Learning via Object-Level Scene Reconstruction

Koch, Sebastian; Hermosilla, Pedro; Vaskevicius, Narunas; Colosi, Mirco; Ropinski, Timo

doi:10.1109/WACV57701.2024.00337

Record link:

http://hdl.handle.net/20.500.12708/203931

Title:

SGRec3D: Self-Supervised 3D Scene Graph Learning via Object-Level Scene Reconstruction

Citation:

Koch, S., Hermosilla, P., Vaskevicius, N., Colosi, M., & Ropinski, T. (2024). SGRec3D: Self-Supervised 3D Scene Graph Learning via Object-Level Scene Reconstruction. In 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (pp. 3392–3402). https://doi.org/10.1109/WACV57701.2024.00337

Publisher DOI:

10.1109/WACV57701.2024.00337

Publication Type:

Inproceedings - Full-Paper Contribution

Language:

English

Authors:

Koch, Sebastian
Hermosilla, Pedro
Vaskevicius, Narunas
Colosi, Mirco
Ropinski, Timo

Organisational Unit:

E193-01 - Forschungsbereich Computer Vision

Published in:

2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

ISBN:

9798350318920

Date (published):

2024

Event name:

2024 IEEE/CVF Winter Conference on Applications of Computer Vision - WACV 2024

Event date:

4-Jan-2024 - 8-Jan-2024

Event place:

Waikoloa, United States of America (the)

Number of Pages:

Peer reviewed:

Yes

Keywords:

3D computer vision; Algorithms; Algorithms; and algorithms; formulations; Machine learning architectures

Abstract:

In the field of 3D scene understanding, 3D scene graphs have emerged as a new scene representation that combines geometric and semantic information about objects and their relationships. However, learning semantic 3D scene graphs in a fully supervised manner is inherently difficult as it requires not only object-level annotations but also relationship labels. While pre-training approaches have helped to boost the performance of many methods in various fields, pre-training for 3D scene graph prediction has received little attention. Furthermore, we find in this paper that classical contrastive point cloud-based pre-training approaches are ineffective for 3D scene graph learning. To this end, we present SGRec3D, a novel self-supervised pre-training method for 3D scene graph prediction. We propose to reconstruct the 3D input scene from a graph bottleneck as a pretext task. Pre-training SGRec3D does not require object relationship labels, making it possible to exploit large-scale 3D scene understanding datasets, which were off-limits for 3D scene graph learning before. Our experiments demonstrate that in contrast to recent point cloud-based pre-training approaches, our proposed pre-training improves the 3D scene graph prediction considerably, which results in SOTA performance, outperforming other 3D scene graph models by +10% on object prediction and +4% on relationship prediction. Additionally, we show that only using a small subset of 10% labeled data during fine-tuning is sufficient to outperform the same model without pre-training.

Research Areas:

Visual Computing and Human-Centered Technology: 100%

Science Branch:

1020 - Informatik: 90%
1010 - Mathematik: 10%

Appears in Collections:

Conference Paper

Show full item record

Page view(s)

130

checked on Nov 5, 2024

Download(s)

checked on Nov 5, 2024

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM