Clustering Generative Adversarial Networks for Story Visualization

Li, Bowen; Torr, Philip H. S.; Lukasiewicz, Thomas

doi:10.1145/3503161.3548034

DC Field

Value

Language

dc.contributor.author

Li, Bowen

dc.contributor.author

Torr, Philip H. S.

dc.contributor.author

Lukasiewicz, Thomas

dc.date.accessioned

2024-01-23T15:42:52Z

dc.date.available

2024-01-23T15:42:52Z

dc.date.issued

2022-10

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Li, B., Torr, P. H. S., & Lukasiewicz, T. (2022). Clustering Generative Adversarial Networks for Story Visualization. In <i>MM ’22: Proceedings of the 30th ACM International Conference on Multimedia</i> (pp. 769–778). Association for Computing Machinery. https://doi.org/10.1145/3503161.3548034</div> </div>

dc.identifier.uri

http://hdl.handle.net/20.500.12708/192623

dc.description.abstract

Story visualization aims to generate a series of images, semantically matching a given sequence of sentences, one for each, and different output images within a story should be consistent with each other. Current methods generate story images by using a heavy architecture with two generative adversarial networks (GANs), one for image quality, and one for story consistency, and also rely on additional segmentation masks or auxiliary captioning networks. In this paper, we aim to build a concise and single-GAN-based network, neither depending on additional semantic information nor captioning networks. To achieve this, we propose a contrastive-learning- and clustering-learning-based approach for story visualization. Our network utilizes contrastive losses between language and visual information to maximize the mutual information between them, and further extends it with clustering learning in the training process to capture semantic similarity across modalities. So, the discriminator in our approach provides comprehensive feedback to the generator, regarding both image quality and story consistency at the same time, allowing to have a single-GAN-based network to produce high-quality synthetic results. Extensive experiments on two datasets demonstrate that our single-GAN-based network has a smaller number of total parameters in the network, but achieves a major step up from previous methods, which improves FID from 78.64 to 39.17, and FSD from 94.53 to 41.18 on Pororo-SV, and establishes a strong benchmark FID of 76.51 and FSD of 19.74 on Abstract Scenes.

dc.language.iso

dc.subject

clustering learning

dc.subject

contrastive learning

dc.subject

GANs

dc.subject

story visualization

dc.title

Clustering Generative Adversarial Networks for Story Visualization

dc.type

Inproceedings

dc.type

Konferenzbeitrag

dc.contributor.affiliation

University of Oxford, United Kingdom of Great Britain and Northern Ireland (the)

dc.relation.isbn

9781450392037

dc.description.startpage

769

dc.description.endpage

778

dc.type.category

Full-Paper Contribution

tuw.booktitle

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

tuw.relation.publisher

Association for Computing Machinery

tuw.relation.publisherplace

New York

tuw.researchTopic.id

tuw.researchTopic.name

Information Systems Engineering

tuw.researchTopic.value

100

tuw.publication.orgunit

E192-07 - Forschungsbereich Artificial Intelligence Techniques

tuw.publication.orgunit

E192-03 - Forschungsbereich Knowledge Based Systems

tuw.publisher.doi

10.1145/3503161.3548034

dc.description.numberOfPages

tuw.author.orcid

0000-0002-8440-543X

tuw.event.name

30th ACM International Conference on Multimedia

tuw.event.startdate

10-10-2022

tuw.event.enddate

14-10-2022

tuw.event.online

On Site

tuw.event.type

Event for scientific audience

tuw.event.place

Lisboa

tuw.event.country

tuw.event.presenter

Li, Bowen

wb.sciencebranch

Informatik

wb.sciencebranch

Mathematik

wb.sciencebranch.oefos

1020

wb.sciencebranch.oefos

1010

wb.sciencebranch.value

item.openairetype

conference paper

item.cerifentitytype

Publications

item.grantfulltext

none

item.languageiso639-1

item.openairecristype

http://purl.org/coar/resource_type/c_5794

item.fulltext

no Fulltext

crisitem.author.dept

E230-03 - Forschungsbereich Straßenwesen

crisitem.author.dept

University of Oxford

crisitem.author.dept

E192-07 - Forschungsbereich Artificial Intelligence Techniques

crisitem.author.orcid

0000-0002-8440-543X

crisitem.author.parentorg

E230 - Institut für Verkehrswissenschaften

crisitem.author.parentorg

E192 - Institut für Logic and Computation

Appears in Collections:

Conference Paper

Show simple item record

Page view(s)

263

checked on Jan 23, 2024

Google Scholar^TM

Check

Page view(s)

Google ScholarTM

Google Scholar^TM