Obweger, H. (2009). Similarity searching in complex business events and sequences thereof [Diploma Thesis, Technische Universität Wien]. reposiTUm. http://hdl.handle.net/20.500.12708/185664
This thesis contributes to the field of complex event-data analysis novel and formally well-founded methods for similarity searching, both on the level of single events and on the level of sequences of events. As event-based systems may produce highly diverse data sets, the main focus of our considerations is on highest possible flexibility. Also, the approaches shall be intelligible to business analysts and, of course, generate meaningful and intuitive results.<br />Finally, the approaches shall be conceptually independent from concrete Complex Event Processing solutions and instead build upon abstract and generally accepted definitions of events, event types, etc.<br />Our approach on single-event similarity builds upon geometric ideas of similarity, with event attribute values defining the relative positioning of two events in an n-dimensional space. Thereby, the similarity between two events is calculated from weighted attribute-level similarities.<br />The proposed approach on event-sequence similarity outperforms existing approaches by allowing analysts to consider event-level similarities, order, and relative and absolute temporal structures in a highly flexible manner. It builds upon an assignment-based understanding of sequence similarity, where each unit of the pattern sequence is considered either represented by a certain event of the target sequence or missing therein. Our algorithm finds the best-possible assignment of the target sequence using a Branch & Bound strategy. This assignment is then used for calculating the similarity between the given sequences.<br />We conclude this work with a practical evaluation, where we apply the approach on event-sequence similarity in real-world scenarios from three application domains. We figured out that the algorithm performs excellent for short and sharp-edged sequences where a majority of events constitute clear and significant characteristics of the event sequence.