Djukanovic, M. (2021). Exact and heuristic approaches for solving string problems from bioinformatics [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2021.93203
This thesis provides several new algorithms for solving prominent string problems from the literature, most of these being variants of the well-known longest common subsequence (LCS) problem. Given a set of input strings, a longest common subsequence is a string of maximum length that can be obtained from each input string by removing letters, i.e., which is a common subsequence of all input strings. The problem is known to be NP–hard and challenging to solve in practice for the general case of an arbitrary set of input strings. Besides the basic LCS problem variant, we consider here the following important variants: the longest common palindromic subsequence problem, the arc-preserving LCS problem, the longest common square subsequence problem, the repetition-free LCS problem, and the constrained LCS problem. These problems provide a range of important measures which serve for detecting similarities between molecules of various structures. Concerning heuristic approaches, we propose a general beam search framework in which many previously described methods from literature can be expressed. In particular, new state-of-the-art results were obtained on various benchmark sets utilizing a novel heuristic guidance that approximates the expected solution length of three different string problems. For solving the longest common square subsequence problem, a hybrid of a Reduced Variable Neighborhood Search method and a Beam search technique has been proposed. Concerning exact techniques, two kinds of methods are proposed: (i) pure exact methods based on A∗ search and (ii) anytime algorithms that build upon the A∗ search framework. Experimental results indicate that this A∗ search is also able to outperform all previously published more specific exact algorithms for the longest common palindromic subsequence and the constrained longest common subsequence problems with two input strings. Concerning anytime algorithms, we first make use of the derived A∗ search framework such that classical A∗ iterations are interleaved with beam search runs. Later, another anytime algorithm variant is proposed in which the beam search part is replaced by a major iteration of the Anytime Column Search. New state-of-the-art results are produced and better final optimality gaps were obtained by the latter hybrid, in comparison to a several state-of-the-art anytime algorithms from the literature. As an alternative exact approach, we further consider the transformation of LCS problem instances into Maximum Clique (MC) problem instance on the basis of so-called conflict graphs. In this way, state-of-the-art MC solvers can be utilized for solving the LCS problem instances. Further, an effective conflict graph reduction based on suboptimality checks is proposed. In conjunction with the general-purpose mixed integer linear programming solver Cplex, new state-of-the-art results are obtained on a wide range of benchmark instances.