SGRec3D: Self-Supervised 3D Scene Graph Learning via Object-Level Scene Reconstruction

Koch, Sebastian; Hermosilla, Pedro; Vaskevicius, Narunas; Colosi, Mirco; Ropinski, Timo

doi:10.1109/WACV57701.2024.00337

DC Element

Wert

Sprache

dc.contributor.author

Koch, Sebastian

dc.contributor.author

Hermosilla, Pedro

dc.contributor.author

Vaskevicius, Narunas

dc.contributor.author

Colosi, Mirco

dc.contributor.author

Ropinski, Timo

dc.date.accessioned

2024-11-05T08:26:10Z

dc.date.available

2024-11-05T08:26:10Z

dc.date.issued

2024

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Koch, S., Hermosilla, P., Vaskevicius, N., Colosi, M., & Ropinski, T. (2024). SGRec3D: Self-Supervised 3D Scene Graph Learning via Object-Level Scene Reconstruction. In <i>2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)</i> (pp. 3392–3402). https://doi.org/10.1109/WACV57701.2024.00337</div> </div>

dc.identifier.uri

http://hdl.handle.net/20.500.12708/203931

dc.description.abstract

In the field of 3D scene understanding, 3D scene graphs have emerged as a new scene representation that combines geometric and semantic information about objects and their relationships. However, learning semantic 3D scene graphs in a fully supervised manner is inherently difficult as it requires not only object-level annotations but also relationship labels. While pre-training approaches have helped to boost the performance of many methods in various fields, pre-training for 3D scene graph prediction has received little attention. Furthermore, we find in this paper that classical contrastive point cloud-based pre-training approaches are ineffective for 3D scene graph learning. To this end, we present SGRec3D, a novel self-supervised pre-training method for 3D scene graph prediction. We propose to reconstruct the 3D input scene from a graph bottleneck as a pretext task. Pre-training SGRec3D does not require object relationship labels, making it possible to exploit large-scale 3D scene understanding datasets, which were off-limits for 3D scene graph learning before. Our experiments demonstrate that in contrast to recent point cloud-based pre-training approaches, our proposed pre-training improves the 3D scene graph prediction considerably, which results in SOTA performance, outperforming other 3D scene graph models by +10% on object prediction and +4% on relationship prediction. Additionally, we show that only using a small subset of 10% labeled data during fine-tuning is sufficient to outperform the same model without pre-training.

dc.language.iso

dc.subject

3D computer vision

dc.subject

Algorithms

dc.subject

Algorithms

dc.subject

and algorithms

dc.subject

formulations

dc.subject

Machine learning architectures

dc.title

SGRec3D: Self-Supervised 3D Scene Graph Learning via Object-Level Scene Reconstruction

dc.type

Inproceedings

dc.type

Konferenzbeitrag

dc.contributor.affiliation

Universität Ulm, Germany

dc.contributor.affiliation

Robert Bosch (Germany), Germany

dc.contributor.affiliation

Robert Bosch (Germany), Germany

dc.contributor.affiliation

Universität Ulm, Germany

dc.relation.isbn

9798350318920

dc.description.startpage

3392

dc.description.endpage

3402

dc.type.category

Full-Paper Contribution

tuw.booktitle

2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

tuw.peerreviewed

true

tuw.researchTopic.id

tuw.researchTopic.name

Visual Computing and Human-Centered Technology

tuw.researchTopic.value

100

tuw.publication.orgunit

E193-01 - Forschungsbereich Computer Vision

tuw.publisher.doi

10.1109/WACV57701.2024.00337

dc.description.numberOfPages

tuw.author.orcid

0009-0007-5777-3206

tuw.author.orcid

0000-0002-1409-5114

tuw.author.orcid

0000-0001-8141-2725

tuw.author.orcid

0000-0002-7857-5512

tuw.event.name

2024 IEEE/CVF Winter Conference on Applications of Computer Vision - WACV 2024

tuw.event.startdate

04-01-2024

tuw.event.enddate

08-01-2024

tuw.event.online

On Site

tuw.event.type

Event for scientific audience

tuw.event.place

Waikoloa

tuw.event.country

tuw.event.presenter

Koch, Sebastian

wb.sciencebranch

Informatik

wb.sciencebranch

Mathematik

wb.sciencebranch.oefos

1020

wb.sciencebranch.oefos

1010

wb.sciencebranch.value

item.languageiso639-1

item.openairetype

conference paper

item.grantfulltext

restricted

item.fulltext

no Fulltext

item.cerifentitytype

Publications

item.openairecristype

http://purl.org/coar/resource_type/c_5794

crisitem.author.dept

Universität Ulm

crisitem.author.dept

E193-01 - Forschungsbereich Computer Vision

crisitem.author.dept

Robert Bosch (Germany)

crisitem.author.dept

Robert Bosch (Germany)

crisitem.author.dept

Robert Bosch (Germany)

crisitem.author.orcid

0009-0007-5777-3206

crisitem.author.orcid

0000-0002-1409-5114

crisitem.author.orcid

0000-0001-8141-2725

crisitem.author.orcid

0000-0002-7857-5512

crisitem.author.parentorg

E193 - Institut für Visual Computing and Human-Centered Technology

Enthalten in den Sammlungen:

Conference Paper

Zur Kurzanzeige

Seiten Aufrufe

132

aufgerufen am 05.11.2024

Download(s)

aufgerufen am 05.11.2024

Google Scholar^TM

Check

Seiten Aufrufe

Download(s)

Google ScholarTM

Google Scholar^TM