<div class="csl-bib-body">
<div class="csl-entry">Li, B., Torr, P. H. S., & Lukasiewicz, T. (2022). Clustering Generative Adversarial Networks for Story Visualization. In <i>MM ’22: Proceedings of the 30th ACM International Conference on Multimedia</i> (pp. 769–778). Association for Computing Machinery. https://doi.org/10.1145/3503161.3548034</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/192623
-
dc.description.abstract
Story visualization aims to generate a series of images, semantically matching a given sequence of sentences, one for each, and different output images within a story should be consistent with each other. Current methods generate story images by using a heavy architecture with two generative adversarial networks (GANs), one for image quality, and one for story consistency, and also rely on additional segmentation masks or auxiliary captioning networks. In this paper, we aim to build a concise and single-GAN-based network, neither depending on additional semantic information nor captioning networks. To achieve this, we propose a contrastive-learning- and clustering-learning-based approach for story visualization. Our network utilizes contrastive losses between language and visual information to maximize the mutual information between them, and further extends it with clustering learning in the training process to capture semantic similarity across modalities. So, the discriminator in our approach provides comprehensive feedback to the generator, regarding both image quality and story consistency at the same time, allowing to have a single-GAN-based network to produce high-quality synthetic results. Extensive experiments on two datasets demonstrate that our single-GAN-based network has a smaller number of total parameters in the network, but achieves a major step up from previous methods, which improves FID from 78.64 to 39.17, and FSD from 94.53 to 41.18 on Pororo-SV, and establishes a strong benchmark FID of 76.51 and FSD of 19.74 on Abstract Scenes.
en
dc.language.iso
en
-
dc.subject
clustering learning
en
dc.subject
contrastive learning
en
dc.subject
GANs
en
dc.subject
story visualization
en
dc.title
Clustering Generative Adversarial Networks for Story Visualization
en
dc.type
Inproceedings
en
dc.type
Konferenzbeitrag
de
dc.contributor.affiliation
University of Oxford, United Kingdom of Great Britain and Northern Ireland (the)
-
dc.relation.isbn
9781450392037
-
dc.description.startpage
769
-
dc.description.endpage
778
-
dc.type.category
Full-Paper Contribution
-
tuw.booktitle
MM '22: Proceedings of the 30th ACM International Conference on Multimedia