Big data analysis has attracted enormous attention during the last decade. In this field, graph signal processing is widely used in order to analyze data on graphs using a signal processing perspective. One of the key problems in this area is clustering, i.e., partitioning the data in homogeneous subgroups (clusters, communities) corresponding to subsets of nodes in the graph. The nodes within a cluster usually tend to be connected by edges that represent similarity relations. The ultimate goal in the clustering problem is to keep the number of nodes that are attributed to a wrong cluster as low as possible. Semi-supervised clustering is a special case of this problem in which some of the cluster labels are assumed to be known in advance and this prior knowledge is exploited to label (cluster) the rest of the data. In this work, we explore the use of error correcting codes in the context of semi-supervised multi-class clustering problems. The key idea here is to employ error-correcting codes for the cluster (class) labels in connection with various clustering schemes and to use the decoder of the error correcting code in order to detect and correct miss-classified data. The proposed algorithms for clustering with error-correcting code labels are studied experimentally in several different scenarios (graph size and topology, number and size of clusters). The performance of these methods is compared to conventional semi-supervised spectral clustering with one-hot encoded labels. The results show that with some modifications of the codeword structure of one-hot encoding, our error correcting approach outperforms existing clustering methods.
en
Weitere Information:
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers