Ejection fraction prediction with pre-trained masked autoencoders for echocardiography

Grausenburger, Marie-Luise

doi:10.34726/hss.2026.135694

Record link:

https://doi.org/10.34726/hss.2026.135694
http://hdl.handle.net/20.500.12708/227850

Title:

Ejection fraction prediction with pre-trained masked autoencoders for echocardiography

Citation:

Grausenburger, M.-L. (2026). Ejection fraction prediction with pre-trained masked autoencoders for echocardiography [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2026.135694

reposiTUm DOI:

10.34726/hss.2026.135694

CatalogPlus:

AC17850915

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Grausenburger, Marie-Luise

Advisor:

Reiter, Michael

Organisational Unit:

E193 - Institut für Visual Computing and Human-Centered Technology

Date (published):

2026

Number of Pages:

104

Keywords:

Echocardiographie; Ejektionsfraktion; Selbstüberwachtes Lernen; Masked Autoencoder; Semantische Segmentierung; Medizinische Bildanalyse

Echocardiography; Ejection Fraction; Self-Supervised Learning; Masked Autoencoders (MAE); Semantic Segmentation; Medical Image Analysis

Abstract:

Herz-Kreislauf-Erkrankungen sind weltweit die häufigste Todesursache. Daher ist eine schnelle und präzise Beurteilung diagnostischer Parameter von Echokardiographie-Videos unerlässlich. Ein wichtiger klinischer Parameter ist dabei die Ejektionsfraktion des linken Ventrikels (LVEF). In klinischen Arbeitsabläufen ist dies nach wie vor eine manuelle und zeitaufwändige Aufgabe. Diese Arbeit präsentiert eine zweistufige Deep-Learning-Architektur, mit der die LVEF automatisch prognostiziert werden kann. In der ersten Phase wird ein Encoder mittels Self-Supervised Learning mit VideoMAE vortrainiert. Für das sogenannte „tube masking“ wurde im Rahmen der Arbeit ein Ansatz evaluiert, bei dem irrelevante Hintergrundinformationen immer maskiert wurden, um das Modell auf anatomisch relevante Herzstrukturen zu fokussieren. In der zweiten Phase wird der vortrainierte Encoder mit zwei Segmentierungsköpfen erweitert und überwacht trainiert. Die Segmentierung des linken Ventrikels zum Zeitpunkt des Systolen- und Diastolenendes (ES/ED) sowie die darauf aufbauende LVEF werden vorhergesagt. Diese Arbeit zeigt, dass die Leistung des Encoders deutlich erhöht wird, wenn nicht von Grund auf mit Echokardiografie-Videos vortrainiert wird, sondern ein Encoder, der auf allgemeinen Videodaten vortrainiert wurde, mit Echokardiografie-Videos weitertrainiert wird. Das beste Modell mit dieser vorgeschlagenen Architektur (echo-segmentation) erreichte auf dem Testdatensatz einen Dice-Koeffizienten von DiceES = 93.35% und DiceED = 90.93% für die Segmentierung des linken Ventrikels in ED und ES. Die LVEF konnte mit einem mittleren absoluten Fehler (MAErr) von 4,27 und einem R2 von 0,73 vorhergesagt werden. Der Code ist verfügbar unter: https://github.com/mar1lle/echo-segmentation/ Im Gegensatz zu anderen Segmentierungsansätzen arbeitet echo-segmentation mit Echokardiografie-Videos und identifiziert implizit die klinisch relevanten Frames für die LVEF, ohne dass zuvor eine Frame-Auswahl getroffen werden muss. Die Ergebnisse der LVEF-Vorhersage sind mit anderen State-of-the-Art-Methoden vergleichbar. Im Vergleich dazu bietet echo-segmentation zusätzlich Segmentierungsmasken des linken Ventrikels zum Zeitpunkt des Systolen- und Diastolenendes, was mit dem klinischen Arbeitsablauf übereinstimmt. Dies verbessert die Interpretierbarkeit und potenziell das klinische Vertrauen.

Cardiovascular diseases are the leading cause of death worldwide. Therefore, fast and accurate assessment of diagnostic parameters on echocardiography videos, such as the left ventricular ejection fraction (LVEF), is needed. In current clinical workflows, it remains a time-consuming and observer-dependent task. This thesis addresses these challenges by developing a two-stage deep learning architecture for the automated prediction of LVEF from echocardiography videos. In the first stage, self-supervised learning with VideoMAE is applied to the echocardiography domain and pretrained on apical four-chamber videos using tube masking. An approach was tested where irrelevant background information was always masked to force the model to focus on reconstructing anatomically relevant cardiac structures. In the second stage, the pretrained encoder is fine-tuned in a supervised manner to perform semantic segmentation of the left ventricle at end-diastole (ED) and end-systole (ES), which is then used to predict the LVEF.This thesis demonstrates that initializing the VideoMAE model with weights pretrained on general videos and continuing with domain-specific pretraining of echocardiographic videos yields the strongest performance. The best fine-tuned model with this proposed architecture echo-segmentation) achieved a Dice Similarity Coefficient of DiceED = 93.35% and DiceES = 90.93% in left ventricular segmentation for the end of diastole and the end of systole. The frame prediction for end of diastole and end of systole reached MAErr = 0.78 and MAErr = 0.81. For predicting the LVEF, the model achieved MAErr of 4.27 and R2 score of 0.73.In contrast to other segmentation approaches, echo-segmentation operates directly on the full echocardiographic video and implicitly identifies the clinically relevant frames for the LVEF without prior frame selection. Although the LVEF performance is comparable to state-of-the-art methods that employ a direct estimation, but the proposed echo-segmentation model additionally offers visual outputs that align with the standard clinical workflow. This improves interpretability and potentially clinical trust. The approach was trained and evaluated on the EchoNet-Dynamic dataset, which contains 10,030 labeled ultrasound videos. The full code is available at: https://github.com/mar1lle/echo-segmentation

Additional information:

Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers

License:

In Copyright

Appears in Collections:

Thesis