<div class="csl-bib-body">
<div class="csl-entry">Hochhauser, P. (2024). <i>Deep learning-based light source estimation from face images</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2024.120596</div>
</div>
-
dc.identifier.uri
https://doi.org/10.34726/hss.2024.120596
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/202485
-
dc.description.abstract
This thesis proposes a novel method to estimate realistic-looking environment images from an input face image. Having correct light information is crucial for a variety of virtual and mixed reality applications, but training deep neural networks to calculate this information requires large datasets, which are not easily obtainable for pairs of face images and corresponding environment maps.We address this problem by creating a synthetic dataset using digital human characters from the MetaHuman framework. These human characters are illuminated by environment maps obtained from different sources and rendered using Unreal Engine. Through parameter augmentation, we achieve a diverse dataset of over 150000 face images with high-quality light information.Using this dataset, we trained a CNN to estimate the brightness of a scene given a single face image. The network is able to identify the most dominant light directions for most indoor and outdoor scenes, but sometimes fails in generating output that topologically matches the layout of equirectangular environment images. For unseen real-life examples of outdoor scenes, it was able to correctly identify the position of the sun.To enable generating realistic-looking images from text input, we finetuned a pretrained diffusion network on environment images. The text prompts are generated from face images using existing image-to-text models. By adding the estimated brightness images from our CNN, we can guide the model to follow the layout of the original scenes.Our final proposed pipeline is therefore a sequential combination of multiple different neural networks, starting from a single face image. First, the brightness of the surrounding scene is estimated from the face image with a CNN. Using the same face image, a text prompt that describes the surrounding scene is generated using a pretrained image-to-text model. Then, the text prompt is fed to a finetuned diffusion network which is additionally conditioned by the estimated brightness image. This yields a modular system for estimating the surrounding environment from a single image of a human face.
en
dc.language
English
-
dc.language.iso
en
-
dc.rights.uri
http://rightsstatements.org/vocab/InC/1.0/
-
dc.subject
Illumination Estimation
en
dc.subject
Deep Learning
en
dc.subject
Generation of Synthetic Dataset
en
dc.title
Deep learning-based light source estimation from face images
en
dc.type
Thesis
en
dc.type
Hochschulschrift
de
dc.rights.license
In Copyright
en
dc.rights.license
Urheberrechtsschutz
de
dc.identifier.doi
10.34726/hss.2024.120596
-
dc.contributor.affiliation
TU Wien, Österreich
-
dc.rights.holder
Philipp Hochhauser
-
dc.publisher.place
Wien
-
tuw.version
vor
-
tuw.thesisinformation
Technische Universität Wien
-
tuw.publication.orgunit
E193 - Institut für Visual Computing and Human-Centered Technology