<div class="csl-bib-body">
<div class="csl-entry">Rippberger Fonseca, G. (2025). <i>Comparative Analysis of Fashion Captioning and Multimodal Fashion Recommendation</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.115007</div>
</div>
-
dc.identifier.uri
https://doi.org/10.34726/hss.2025.115007
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/215651
-
dc.description
Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft
-
dc.description
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers
-
dc.description.abstract
This thesis explores two main tasks: (1) fine-tuning image captioning models for fashion datasets and (2) evaluating different feature spaces for personalized fashion recommendations. We fine-tune state-of-the-art vision-language models - BLIP-2 and LLaVA - on two fashion datasets, H&M and FACAD, to generate product descriptions. Our quantitative and qualitative analyses show that fine-tuning can achieve performance levels comparable to fully training a model (SRFC) specifically for generating "fashion captions".With our qualitative analysis of the captioning results, we take a deep dive into understanding the models' limitations and identify what works well and what does not. We find that working with datasets that have clearly identifiable visual cues for words, e.g., front pocket, can improve the fine-tuning process. The models struggled with non-visual attributes (e.g., material composition, designer names), distinguishing fine-grained differences (e.g., satin vs. velvet), and handling partial or ambiguous product images. These limitations highlight the need for dataset curation that emphasizes visible attributes.For recommendations, we extract multimodal features (visual, textual, and combined) and evaluate them using the VBPR recommendation algorithm on the H&M dataset. Besides sophisticated models for feature embeddings such as ResNet50 (visual features) or SentenceBERT (textual features), we use our, on the H&M dataset, fine-tuned BLIP-2 model to extract additional features, which we hypothesized to work better. Surprisingly, textual embeddings performed better than visual and multimodal features with VBPR, suggesting that text-based attributes provide better signals for recommendations than image features, in this setup. However, overall performance across different feature spaces remains similar, with ItemKNN outperforming VBPR results.Our findings demonstrate that fine-tuning is an effective and simpler alternative to complex reward-based training. Additionally, despite fashion being a visual domain, textual descriptions resulted in the best recommendation performance. Future work should focus on exploring the performance of already available models for fashion datasets and refining datasets for better performance.
en
dc.language
English
-
dc.language.iso
en
-
dc.rights.uri
http://rightsstatements.org/vocab/InC/1.0/
-
dc.subject
Fashion Recommender Systems
de
dc.subject
Fashion Captioning
de
dc.subject
Multimodal Recommendations
de
dc.subject
Fashion Recommender Systems
en
dc.subject
Fashion Captioning
en
dc.subject
Multimodal Recommendations
en
dc.title
Comparative Analysis of Fashion Captioning and Multimodal Fashion Recommendation
en
dc.type
Thesis
en
dc.type
Hochschulschrift
de
dc.rights.license
In Copyright
en
dc.rights.license
Urheberrechtsschutz
de
dc.identifier.doi
10.34726/hss.2025.115007
-
dc.contributor.affiliation
TU Wien, Österreich
-
dc.rights.holder
Gwendolyn Rippberger Fonseca
-
dc.publisher.place
Wien
-
tuw.version
vor
-
tuw.thesisinformation
Technische Universität Wien
-
tuw.publication.orgunit
E194 - Institut für Information Systems Engineering
-
dc.type.qualificationlevel
Diploma
-
dc.identifier.libraryid
AC17527743
-
dc.description.numberOfPages
86
-
dc.thesistype
Diplomarbeit
de
dc.thesistype
Diploma Thesis
en
dc.rights.identifier
In Copyright
en
dc.rights.identifier
Urheberrechtsschutz
de
tuw.advisor.staffStatus
staff
-
tuw.advisor.orcid
0000-0001-7184-1841
-
item.openaccessfulltext
Open Access
-
item.grantfulltext
open
-
item.openairetype
master thesis
-
item.openairecristype
http://purl.org/coar/resource_type/c_bdcc
-
item.cerifentitytype
Publications
-
item.languageiso639-1
en
-
item.fulltext
with Fulltext
-
crisitem.author.dept
E194-04 - Forschungsbereich Data Science
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering