<div class="csl-bib-body">
<div class="csl-entry">Wang, R., Xu, Z., Wang, X., Liu, W., & Lukasiewicz, T. (2026). C2M-DoT: Cross-modal consistent multi-view medical report generation with domain transfer network. <i>Information Fusion</i>, <i>125</i>, Article 103442. https://doi.org/10.1016/j.inffus.2025.103442</div>
</div>
-
dc.identifier.issn
1566-2535
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/223697
-
dc.description.abstract
Objectives: In clinical practice, multiple medical images from different views provide valuable complementary information for diagnosis. However, existing medical report generation methods struggle to fully integrate multi-view data, and their reliance on multi-view input during inference limits practical applicability. Moreover, conventional word-level optimization often neglects the semantic alignment between images and reports, leading to inconsistencies and reduced diagnostic reliability. This paper aims to address these limitations and improve the performance and efficiency of medical report generation. Methods: We propose C2M-DoT, a cross-modal consistent multi-view medical report generation method with domain transfer network. C2M-DoT (i) uses semantic-based contrastive learning to fuse multi-view information to enhance lesion representation, (ii) uses domain transfer network to bridge the gap in inference performance across views, (iii) uses cross-modal consistency loss to promote personalized alignment of multi-modalities and achieve end-to-end joint optimization. Novelty and Findings: C2M-DoT pioneered the use of multi-view contrastive learning for the high semantic level of report decoding, and used a domain transfer network to overcome the data dependency of multi-view models, while enhancing the semantic matching of images and reports through cross-modal consistency optimization. Extensive experiments show that C2M-DoT outperforms state-of-the-art baselines and achieves a BLEU-4 of 0.159 and a ROUGE-L of 0.380 on the IU X-ray dataset, and a BLEU-4 of 0.193 and a ROUGE-L of 0.385 on the MIMIC-CXR dataset.
en
dc.language.iso
en
-
dc.publisher
ELSEVIER
-
dc.relation.ispartof
Information Fusion
-
dc.subject
Cross-modal consistency
en
dc.subject
Domain transfer
en
dc.subject
Medical report generation
en
dc.subject
Multi-view contrastive learning
en
dc.title
C2M-DoT: Cross-modal consistent multi-view medical report generation with domain transfer network