Purpose : Radiation oncology relies on the accuracy of organ-at-risk (OAR) and target structure delineations for dose calculations and treatment planning. The effect of inaccurate structure localization is more pronounced in particle therapy due to higher dose gradients in comparison to photon therapy. Manual segmentation is the gold standard even though inter-observer-variability and temporal anatomical changes influence subsequent dose calculations. However, manual segmentation is a time-consuming task, which makes it unfeasible for adaptive radiotherapy (ART). The increased workload due to multiple replannings could, therefore, benefit from automated segmentation. In particular, deep learning methods have already been used with great success in segmentation tasks and are currently investigated for their use in medical applications. Discriminative classifiers are one such method that is typically used for image segmentation. However, they suffer from small data sets resulting in overfitting, and typical loss functions do not guarantee spatial consistency. Generative adversarial networks (GANs), on the other hand, have shown promise in tackling those issues. This thesis aimed to compare GANs with convolutional neural networks (CNNs) based on the popular U-net architecture, as a type of discriminative classifier, on different sized training data. This was done to determine a potential benefit of using GANs for segmentation in radiation oncology. Both network architectures were trained, validated, and tested on segmentations of a subset of OARs in prostate cancer patients. Methods and Materials: Data of 360 patients, who were treated for prostate cancer at the Department of Radiation Oncology at the Medical University of Vienna, was investigated. The OARs were bladder, rectum, and the femoral heads. The CNN, as well as the GAN architecture, were trained on different training data sets consisting of 1, 6, 11, 16, 21, 26, and 100 patients. An extensive hyperparameter search was performed to identify the best settings for all observed structures. The performance of the networks was evaluated using metrics such as Dice similarity coefficient (DSC), sensitivity, precision, Hausdorff distance (HD), mean squared error (MSE), and root mean squared error (RMSE).Results: No significant difference could be observed for small training data set sizes between GAN and CNN when measured with the DSC. For sensitivity and precision, the networks did perform differently for certain data set sizes and OARs. The precision score of the GAN was higher for 6, 16, and 21 patients. For 1 and 6 patients in the training data set, the CNN performed better according to sensitivity. However, the sensitivity score of the GAN was higher for 16 patients. As sensitivity and precision are associated with over- and underrepresentation, this could indicate a correcting influence of the discriminator; however, there is no discernible trend amongst all patient data set sizes. For 100 patients, no significant difference was observed between the different architectures resulting in near-identical mean DSCs of bladder, rectum, and femoral heads of 0.89 ± 0.08, 0.84 ± 0.08, and 0.93 ± 0.06, respectively.Conclusion: In this thesis, no significant difference in performance between GAN and CNN could be observed. When compared to other factors, such as network architecture and data preprocessing, it might not be worth training GANs for image segmentation.
Items in reposiTUm are protected by copyright, with all rights reserved, unless otherwise indicated.