Rippberger Fonseca, Maria De Los Angeles Gwendolyn Aglae
-
dc.contributor.author
Neidhardt, Julia
-
dc.date.accessioned
2026-02-13T08:26:53Z
-
dc.date.available
2026-02-13T08:26:53Z
-
dc.date.issued
2025-09-22
-
dc.identifier.citation
<div class="csl-bib-body">
<div class="csl-entry">Rippberger Fonseca, M. D. L. A. G. A., & Neidhardt, J. (2025, September 22). <i>Comparative Analysis of Fashion Captioning for Multimodal Fashion Recommendation</i> [Conference Presentation]. 19th ACM Conference on Recommender Systems (RecSys ’25), Prag, Czechia. http://hdl.handle.net/20.500.12708/226348</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/226348
-
dc.description.abstract
Multimodal information provides new opportunities for recommender systems, especially in the fashion domain, where both visual and textual information can be utilized to provide a comprehensive understanding of the product. In this work, we focused on the task of fashion captioning, a specialized form of image captioning for fashion items. We fine-tuned pretrained vision-language models on two distinct fashion datasets to evaluate how effectively they capture dataset-specific ground truths. We were able to fine-tune the models successfully to a competitive result with specifically trained models. The resulting captioning models are applied in two key scenarios: (1) as components for generating richer multimodal embeddings in recommender systems, and (2) for modality imputation, where automatically generated descriptions are used to fill in missing textual data. We show that different modalities work better depending on the size of the dataset and the list length but none outperform the traditional item-based collaborative filtering technique using a real-life dataset with over 1M users and 31M transactions. Additionally, we present a detailed analysis of the two fashion datasets, highlighting critical aspects such as item presentation and textual style, which are often overlooked yet essential for effective modeling.
en
dc.language.iso
en
-
dc.subject
Multimodal Recommendation
en
dc.subject
Fashion Captioning
en
dc.subject
NLP
en
dc.subject
Generative AI
en
dc.title
Comparative Analysis of Fashion Captioning for Multimodal Fashion Recommendation