Rümmele, N. (2015). Temporal link prediction using graph pattern matching [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2015.30201
Many real-world systems are composed of individual components which are inter-linked in some way. To understand and predict the behavior of such a system, we can model it as a network where components correspond to nodes and connections are links between the nodes. Real-world networks are highly dynamic: new nodes and links appear permanently while the existing ones can vanish, and various attributes for nodes and links may change. One of these changes, and the focus of this thesis, is the creation of new links. In the field of network analysis the task of predicting the appearance of a link between two nodes in the network is known as link prediction. In the classical setting, we assume that there is no direct link between these nodes, and they already exist in the network. An established approach to build a network evolution model is to use an observed property of the global network structure. However, we follow another approach: by identifying the patterns of node connections, the so-called graph patterns, we investigate how these patterns that are observable at the current time can help us predict the future link formation in the network. So, temporal information is of great importance to us, hence, we refer to the problem at hand as temporal link prediction. The considered graph patterns can, first of all, vary in their size (either in terms of the number of considered nodes or links), and secondly, they can include various additional information, such as attributes of nodes and links, or different types of connections between nodes. Hence, we can capture on a very fine grained level the evolution of the network over time. Another advantage is that we can overcome the problem of predicting links for a node which does not yet exist in the network. Such problem is often referred to as a cold start problem. The goal of this thesis is to advance the state-of-the-art of graph pattern based temporal link prediction. More specifically, we focus on the following three important aspects. The first goal is to improve the quality of prediction. Secondly, we focus on the efficiency of the involved computations. The third goal is to widen the field of application domains where link prediction techniques have not been used before. The three main contributions of this thesis are: (i) We introduce a new feature which allows us to apply link prediction techniques to solve the citation count prediction problem. This is a known problem in the area of bibliometrics, but it has never been perceived and studied before as a link prediction problem. (ii) We build time and heterogeneity scores which we use to predict links in heterogeneous networks over time, a new promising domain in the link prediction community. (iii) We design a benchmark of database systems for graph pattern matching which provides us guidelines to choose the best tool to implement the prediction methods above.