Zessner-Spitzenberg, A. (2022). To be or not to be : Detecting points of interest along route segments [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2022.90362
Navigation systems guide their users from a starting point to a destination. In general, the instructions are based on distances and street names. A potential improvement is to include Points of Interest (POIs) into navigation instructions. POIs are often located inside buildings, e.g., shops or restaurants. However, the knowledge from which side the POI is visible is not stored in the available datasets. Therefore, current navigation systems cannot verify, if the POI for the next instruction is visible from a specific location in the street and can be used for navigation. The aim of this thesis is to connect street segments with visible POIs after detecting those in street view images with either a YOLOv3 architecture or Visual Salience and a YOLOv3 architecture. The approach is divided into 3 steps: Pre-processing, Detection and Allocation. The Pre-processing includes the preparation of the street view images and the geodata such as the POIs, buildings, and street segments. These are retrieved from open data sources. In the pre-processing the positions of the street view images are corrected, corner buildings are detected in the building dataset and a pre-selection of the images is done by analysing if buildings, with POIs inside, are in the field of view of the image. The second part, detection, is responsible for identifying the POI in the image. The detector is once used on the original images and once after Visual Salience was applied on the image. Then the Machine Learning detector recognizes the POIs only in the most salient parts of the image. The goal is to compare these two approaches, with and without Visual Salience. To be able to identify POIs in street images a POI detector needs to be trained. The images for the training of the POI detector are gathered from various sources. Pre-trained weights are needed to be produced for the POI detector, since not enough images are available. After the detection of a POI in an image the combination of the POI from the image and the OSM data needs to be done. This part is called allocation. This includes the calculation of the building and the facade where the POI is found in the image. On one hand this information is essential to assign the POI to a street segment. On the other hand, the information about the building is needed to find the corresponding POI stored in the geodata. The steps, which are part of the Pre-processing and the Allocation, are implemented into the pipeline of the algorithm of this thesis successfully. The quality of these steps strongly depends on the accuracy of the position of the street view images and the POIs. Moreover, the result is dependent on the topicality of street view images and the information about the POIs. Further improvements of the results could be achieved by finding the optimal maximum distance between the street view images and a POI or other thresholds. The main shortcoming of this work is the detection of POIs in street view images. The training of the pre-trained weights did not work. Thus, the training of the POI detector could not be done as well. The training of the pre-trained weights either did not converge or failed due to overfitting. In case that this missing part of the algorithm can be successfully implemented, the developed pipeline will be able to allocate, the POIs to street segments and use them in navigation instructions.