Wang, R., Xu, Z., Wang, X., Liu, W., & Lukasiewicz, T. (2026). C2M-DoT: Cross-modal consistent multi-view medical report generation with domain transfer network. Information Fusion, 125, Article 103442. https://doi.org/10.1016/j.inffus.2025.103442
E192-07 - Forschungsbereich Artificial Intelligence Techniques E192-03 - Forschungsbereich Knowledge Based Systems
-
Journal:
Information Fusion
-
ISSN:
1566-2535
-
Date (published):
Jan-2026
-
Number of Pages:
14
-
Publisher:
ELSEVIER
-
Peer reviewed:
Yes
-
Keywords:
Cross-modal consistency; Domain transfer; Medical report generation; Multi-view contrastive learning
en
Abstract:
Objectives: In clinical practice, multiple medical images from different views provide valuable complementary information for diagnosis. However, existing medical report generation methods struggle to fully integrate multi-view data, and their reliance on multi-view input during inference limits practical applicability. Moreover, conventional word-level optimization often neglects the semantic alignment between images and reports, leading to inconsistencies and reduced diagnostic reliability. This paper aims to address these limitations and improve the performance and efficiency of medical report generation. Methods: We propose C2M-DoT, a cross-modal consistent multi-view medical report generation method with domain transfer network. C2M-DoT (i) uses semantic-based contrastive learning to fuse multi-view information to enhance lesion representation, (ii) uses domain transfer network to bridge the gap in inference performance across views, (iii) uses cross-modal consistency loss to promote personalized alignment of multi-modalities and achieve end-to-end joint optimization. Novelty and Findings: C2M-DoT pioneered the use of multi-view contrastive learning for the high semantic level of report decoding, and used a domain transfer network to overcome the data dependency of multi-view models, while enhancing the semantic matching of images and reports through cross-modal consistency optimization. Extensive experiments show that C2M-DoT outperforms state-of-the-art baselines and achieves a BLEU-4 of 0.159 and a ROUGE-L of 0.380 on the IU X-ray dataset, and a BLEU-4 of 0.193 and a ROUGE-L of 0.385 on the MIMIC-CXR dataset.