C2M-DoT: Cross-modal consistent multi-view medical report generation with domain transfer network

Wang, Ruizhi; Xu, Zhenghua; Wang, Xiangtao; Liu, Weipeng; Lukasiewicz, Thomas

doi:10.1016/j.inffus.2025.103442

Record link:

http://hdl.handle.net/20.500.12708/223697

Title:

C2M-DoT: Cross-modal consistent multi-view medical report generation with domain transfer network

Citation:

Wang, R., Xu, Z., Wang, X., Liu, W., & Lukasiewicz, T. (2026). C2M-DoT: Cross-modal consistent multi-view medical report generation with domain transfer network. Information Fusion, 125, Article 103442. https://doi.org/10.1016/j.inffus.2025.103442

Publisher DOI:

10.1016/j.inffus.2025.103442

Publication Type:

Article - Original Research Article

Language:

English

Authors:

Wang, Ruizhi
Xu, Zhenghua
Wang, Xiangtao
Liu, Weipeng
Lukasiewicz, Thomas

Organisational Unit:

E192-07 - Forschungsbereich Artificial Intelligence Techniques
E192-03 - Forschungsbereich Knowledge Based Systems

Journal:

Information Fusion

ISSN:

1566-2535

Date (published):

Jan-2026

Number of Pages:

Publisher:

ELSEVIER

Peer reviewed:

Yes

Keywords:

Cross-modal consistency; Domain transfer; Medical report generation; Multi-view contrastive learning

Abstract:

Objectives: In clinical practice, multiple medical images from different views provide valuable complementary information for diagnosis. However, existing medical report generation methods struggle to fully integrate multi-view data, and their reliance on multi-view input during inference limits practical applicability. Moreover, conventional word-level optimization often neglects the semantic alignment between images and reports, leading to inconsistencies and reduced diagnostic reliability. This paper aims to address these limitations and improve the performance and efficiency of medical report generation. Methods: We propose C2M-DoT, a cross-modal consistent multi-view medical report generation method with domain transfer network. C2M-DoT (i) uses semantic-based contrastive learning to fuse multi-view information to enhance lesion representation, (ii) uses domain transfer network to bridge the gap in inference performance across views, (iii) uses cross-modal consistency loss to promote personalized alignment of multi-modalities and achieve end-to-end joint optimization. Novelty and Findings: C2M-DoT pioneered the use of multi-view contrastive learning for the high semantic level of report decoding, and used a domain transfer network to overcome the data dependency of multi-view models, while enhancing the semantic matching of images and reports through cross-modal consistency optimization. Extensive experiments show that C2M-DoT outperforms state-of-the-art baselines and achieves a BLEU-4 of 0.159 and a ROUGE-L of 0.380 on the IU X-ray dataset, and a BLEU-4 of 0.193 and a ROUGE-L of 0.385 on the MIMIC-CXR dataset.

Research Areas:

Information Systems Engineering: 100%

Science Branch:

1020 - Informatik: 80%
1010 - Mathematik: 20%

Appears in Collections:

Article

Show full item record

Page view(s)

checked on Jan 7, 2026

Google Scholar^TM

Check

Page view(s)

Google ScholarTM

Google Scholar^TM