Graph neural network based classification of biological network structured systems

Lux, Laurin

doi:10.34726/hss.2023.106283

Record link:

https://doi.org/10.34726/hss.2023.106283
http://hdl.handle.net/20.500.12708/177158

Title:

Graph neural network based classification of biological network structured systems

Citation:

Lux, L. (2023). Graph neural network based classification of biological network structured systems [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2023.106283

reposiTUm DOI:

10.34726/hss.2023.106283

CatalogPlus:

AC16853413

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Lux, Laurin

Advisor:

Filzmoser, Peter

Organisational Unit:

E105 - Institut für Stochastik und Wirtschaftsmathematik

Date (published):

2023

Number of Pages:

Keywords:

Graph Neural Networks; GNN; Light-Sheet Fluorescence Microscopy; LSFM; Geometric Deep Learning; Node-Classification; Machine Learning; Deep Learning; Biomedical Imaging; Biological Networks

Abstract:

Graph Neural Networks (GNNs) haben in den letzten Jahren große Aufmerksamkeit erlangt. Sie stellen ein flexibles und leistungsfähiges Konzept für die multimodale Analyse von graph-strukturierten Daten dar. Die Vielfältigkeit dieser Daten ermöglicht ein breites Anwendungsspektrum in den unterschiedlichsten Bereichen. Das Potenzial von GNNs für die Analyse komplexer, biologischer, netzwerkstrukturierter Systeme wurde bisher jedoch noch nicht umfassend erforscht. In dieser Arbeit werden GNNs für die Klassifizierung von Graphen eingeführt, die das Lymphgefäßsystem und das enterische Nervensystem des Darms modellieren. Light-Sheet Fluorescence Microscopy (LSFM) in Kombination mit Immunolabeling und Tissue Clearing ermöglicht die Bildgebung solcher Strukturen in ihrer vollen Komplexität im gesamten Mauskörper. Allerdings ist LSFM auf drei unterscheidbare Fluorophore im sichtbaren Lichtbereich beschränkt. Diese Limitierung motiviert dazu, das Problem des Multiplexing für die Darmdaten zu lösen. Dafür, wurden GNNs eingesetzt, die die extrahierten Darmgraphen nach der Bildgebung klassifizieren. Unter Verwendung von Mehrkanal-Bilddaten, bei denen Lymphgefäße und Nerven mit verschiedenen Fluorophoren gefärbt sind, war es möglich, einen Graphen mit bekannten Gefäß Annotationen zu extrahieren. Durch Training auf diesem Graphen konnte ein Modell erstellt werden, das auf Graphen angewendet werden kann, die aus Einkanal-Bilddaten stammen. Bei diesen ist die Annotation des Nerven- und Lymphsystems ansonsten nur manuell durch einen Biologen möglich. Es wurde eine auf dem SAGE GNN-Modell basierende Methode entwickelt, die eine Balanced Accuracy von 75,9\% und eine Accuracy von 77,0% erreicht und damit Algorithmen wie den Random Forest (RF) (Balanced Accuracy von 71,8%) übertrifft. Über die reine Klassifizierungsgenauigkeit hinaus zeigte die Analyse der Konnektivität innerhalb der Klassen ein überlegenes Verhalten der GNN-Modelle. Der Jaccard-Index (JI) der größten Connected Component in der Ground Truth mit den beiden größten Connected Components in der SAGE-Vorhersage ergab Werte von 0,52 bzw. 0,53 für die Lymph- und Nervennetzwerke. Im Gegensatz dazu erreichte der RF-Algorithmus in JI nur 0,43 und 0,18. Dieses Ergebnis zeigt, dass die RF-Vorhersagen trotz akzeptabler Genauigkeit die Konnektivität schlecht erhalten, was für die weitere Analyse entscheidend ist. Schließlich wurden verschiedene Ansätze für die Merkmalsextraktion aus Rohbildern untersucht, um die aufgabenspezifische Klassifikationsleistung zu verbessern. Ein GNN-Modell mit einem LSTM-Merkmalsextraktor auf den Gefäßmittellinien erreichte eine Balanced Accuracy von 76,9% auf dem Testteil des Mehrkanalgraphen, während der RF-Algorithmus einen Wert von 70,6% erreichte.

Graph Neural Networks (GNNs) have gained significant attention in recent years. They compromise a flexible and powerful deep learning concept for multimodal graph-structured data analysis. The abundance of graph-structured data facilitates a wide range of applications across vastly different domains. However, the potential of GNNs for the analysis of complex, biological, bodily network structured systems has not been comprehensively explored to this date. In this work, GNNs are introduced to the tasks of node-level classification for graphs that model the lymphatic vessel system and the enteric nervous system of the gut. Light-Sheet Fluorescence Microscopy (LSFM), in combination with immunolabeling and tissue clearing, allows the imaging of such structures in their full complexity in the entire mouse body. However, LSFM is limited to three distinguishable fluorophores in the visible light range. This limitation motivates to solve the multiplexing challenge for the gut data. To address this challenge, GNNs are employed to perform post-imaging classification on extracted gut graphs. Using multi-channel imaging data, where lymphatic vessels and nerves are stained with different fluorophores, was possible to extract a graph representation with known labels. Training on this ground truth graph a model could be created that generalizes towards a graph extracted from single-channel imaging data, where labeling of the nervous and lymphatic system is otherwise only possible manually by a trained biologist. A method based on the SAGE GNN model was developed, which reaches a balanced accuracy of 75.9%, and an accuracy score of 77.0%, which outperforms baseline algorithms such as Random Forest (balanced accuracy of 71.8%). Beyond pure classification performance, the analysis of within-class connectivity revealed superior behavior of the GNN classifiers. The Jaccard index (JI) of the largest connected component in the ground truth with the combined two largest connected components in the SAGE prediction resulted in values of 0.52 and 0.53 for the lymph and nerve networks, respectively. In contrast, the Random Forest algorithm performs at only 0.43 and 0.18 in JI. This result shows that despite acceptable accuracy, the baseline predictions are bad at preserving connectivity, which is crucial for further analysis. Finally, different modalities for feature extraction from raw images to improve task-specific classification performance were investigated. A GNN model with an attached learnable LSTM feature extractor on the vessel centerlines achieved a balanced accuracy of 76.9% on the test partition of the multi-channel graph while the baseline Random Forest algorithm achieved a value of 70.6%.

License:

In Copyright

Appears in Collections:

Thesis