Smart 3D geometry understanding within a dynamic large triangulated point cloud

Höller, Benjamin

doi:10.34726/hss.2019.53448

DC Field

Value

Language

dc.contributor.advisor

Kaufmann, Hannes

dc.contributor.author

Höller, Benjamin

dc.date.accessioned

2020-06-28T21:14:15Z

dc.date.issued

2019

dc.date.submitted

2019-11

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Höller, B. (2019). <i>Smart 3D geometry understanding within a dynamic large triangulated point cloud</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2019.53448</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2019.53448

dc.identifier.uri

http://hdl.handle.net/20.500.12708/4509

dc.description.abstract

Diese Arbeit untersucht die Nutzung verschiedener neuronaler Netze für Objekterkennung in 3D gescannten Szenen. Dazu wird ein bestehendes verteiltes 3D Rekonstruktionssystem adaptiert und um eine universelle Schnittstelle erweitert. Die automatische Objekterkennung und Segmentierung ermöglicht die Interaktion mit Szenenobjekten und unterstützt Nutzer bei Suchaufgaben. Serverseitig wird eine Umgebung mit einer RGB-D-Kamera gescannt, um eine volumetrisches 3D Model zu erzeugen. Clientseitig werden diese Information trianguliert, um diese mit der Unreal Engine in virtueller Realität zu erforschen. Die RGB-Bilder der Kamera werden zusätzlich von einem neuronalen Netz interpretiert. Die Objekte, die dieses Netz erkennt, werden in der 3D-Oberflächenrekonstruktion entsprechend markiert. Durch Grundlegende strukturelle Änderungen, umfangreiche Datenfilterung und einem speziellen Abstimmungsalgorithmus, wird die kumulative Echtzeit Segmentierung der Szenenobjekte optimiert. Nummerische Filter beeinflussen die Gesamterkennungsrate, visuelle Filter bestimmen die räumliche Abgrenzung von Szenenobjekten. Schlussendlich werden in der 3D Rekonstruktion erkannte Szenenobjekte von einem dreidimensionalen Rahmen umschlossen. Um eine effiziente Interaktion mit diesen Objekten zu ermöglichen, wird ihre grobe Geometrie mit automatisch erzeugten Collider Boxen repliziert. Die Fähigkeiten des entwickelten Systems werden mit zwei verschiedenen neuronalen Netzen getestet. Dem SSD_Mobile_Net, welches erkannte Objekte mit einer 2D-Bounding Box umrahmt und dem Mask-RCNN welches zusätzlich eine pixel-basierte Segmentierungsmaske bereitstellt. Jeder Parameter der Filterpipeline wurde analysiert, um die Gesamterkennungsrate sowie die räumliche Segmentierung der Objekte zu optimieren. Für die Integration neuer neuronaler Netze wurden entsprechende Richtlinien definiert.

dc.description.abstract

Recent developments of machine learning algorithms resulted in outstanding findings for many different fields of applied computer science. The superior object detection performance of convolutional neural networks leads to lots of different neural network types and architectures. This thesis explores the utilization of state of the art object detection networks to achieve real time semantic annotations within a reconstructed 3D scene. An existing reconstruction framework is extended to implement an universal interface for different neural network types. This allows for an easy exchange of the used neural network and enables fast integration of future developments. With object detection the geometric reconstruction is extended towards a semantic scene understanding. The automatic annotation and segmentation of scene objects can be used to assists the user with exploration tasks and enables interaction with scene objects. The existing framework allows the distant live exploration of a scanned environment in virtual reality. It is based on InfiniTAM and consists of three main modules. At server side an environment is scanned with a RGB-D camera to generate a reconstruction of the scene. This 3D representation is transmitted to the client side where it is triangulated to a mesh. Finally this mesh can be explored within virtual reality using the Unreal Engine. The RGB images of the camera stream are used as an input for a convolutional neural network. The object detection results, represented as 2D bounding boxes or segmentation masks, are projected onto the 3D surface reconstruction. Fundamental changes of the processing pipeline allow the use of fully convolutional segmentation networks with long processing times while keeping the live reconstruction and streaming capabilities of the framework. An extensive filtering pipeline and a novel voting algorithm optimize the segmentation of the scene objects. Finally annotated three-dimensional bounding boxes enclose detected scene objects in the reconstruction. Additionally generated colliders represent their coarse geometry. This enables efficient interaction with scene objects, increasing the immersion of the user. The SSD_Mobile_Net box detection network and the Mask-RCNN segmentation network are implemented to test the reconstruction framework against a ground truth. Each parameter of the filter pipeline is evaluated to optimize the performance of the developed framework. Numerical filters influence the overall detection rate, visual filters determine the spatial segmentation of scene objects. The fusion of 2D bounding boxes shows a better overall result than the projection of segmentation results. Guidelines provide advice for the integration of new neural networks.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

Virtual Reality, Neural Networks, Object Detection, 3D Surface Reconstruction, SLAM

dc.title

Smart 3D geometry understanding within a dynamic large triangulated point cloud

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2019.53448

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Benjamin Höller

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

dc.contributor.assistant

Mossel, Annette

tuw.publication.orgunit

E193 - Institut für Visual Computing and Human-Centered Technology

dc.type.qualificationlevel

Diploma

dc.identifier.libraryid

AC15509703

dc.description.numberOfPages

109

dc.identifier.urn

urn:nbn:at:at-ubtuw:1-131023

dc.thesistype

Diplomarbeit

dc.thesistype

Diploma Thesis

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.assistant.staffStatus

staff

tuw.advisor.orcid

0000-0002-0322-9869

item.languageiso639-1

item.openairetype

master thesis

item.grantfulltext

open

item.fulltext

with Fulltext

item.cerifentitytype

Publications

item.mimetype

application/pdf

item.openairecristype

http://purl.org/coar/resource_type/c_bdcc

item.openaccessfulltext

Open Access

crisitem.author.dept

TU Wien

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(6.32 MB)

In Copyright

Show simple item record

Page view(s)

423

checked on Nov 23, 2023

Download(s)

138

checked on Nov 23, 2023

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM