Semantic segmentation of image sequences using a spatio-temporal U-Net

Danner, Manuel

doi:10.34726/hss.2020.66920

Record link:

https://doi.org/10.34726/hss.2020.66920
http://hdl.handle.net/20.500.12708/15636

Title:

Semantic segmentation of image sequences using a spatio-temporal U-Net

Citation:

Danner, M. (2020). Semantic segmentation of image sequences using a spatio-temporal U-Net [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2020.66920

reposiTUm DOI:

10.34726/hss.2020.66920

CatalogPlus:

AC15754267

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Danner, Manuel

Advisor:

Sablatnig, Robert

Organisational Unit:

E193 - Institut für Visual Computing and Human-Centered Technology

Date (published):

2020

Number of Pages:

Keywords:

Convolutional Neural Networks; Semantic Segmentation; U-Net; Spatio-Temporal Networks

Abstract:

In biomedical research, detailed structure of tissues, cells, organelles and macromolecular complexes is investigated with electron microscopy (EM) images. On this account, large amounts of high resolution images from biological and clinical specimens exist. As a result, there is a need for computer assisted tools that can provide a cost effective solution for disease diagnostics. This thesis illustrates a novel elastic image transformation method called Elastic Gradient Transformation (EGT), which uses the image gradient to generate realistic looking deformations of cell structures. The novel EGT method helps our neural network to generalize on little cell datasets (like the ISBI 2012 dataset), without overfitting. The U-Net architecture by O. Ronneberger, P. Fischer and T. Brox, is adapted to contain an additional encoder path. The proposed network is called SiamU-Net, and takes two sequential images t and t+1 as input. The output is a class probability map of image t+1. It is important that both encoder paths do not share weights, and are fused together in the latent space of the network. The single decoding path uses skip connections from the encoding path of image t+1 to generate an improved up-sampling. To evaluate the adaptation, both U-Net and SiamU-Net use the novel elastic gradient transformation method and participate in the ISBI 2012 challenge. To highlight the impact of temporal image information on the two networks, a comparison of both networks is made on a video dataset called DAVIS 2016. At the ISBI 2012 challenge, the proposed SiamU-Net with the EGT method is placed at rank 31, while the original U-Net is placed at rank 61, out of 223 participants. On the Davis 2016 challenge, the SiamU-Net achieves a 0.0776 point higher Jaccard Index than the U-Net architecture, which proves the advantage of adapting the U-Net with an additional encoder path.

License:

In Copyright

Appears in Collections:

Thesis