E104 - Institut für Diskrete Mathematik und Geometrie
-
Date (published):
2025
-
Number of Pages:
82
-
Keywords:
combinatorics; asymptotic enumeration; compression algorithms; information theory
en
Abstract:
In this thesis, we investigate different compression methods for plane tree data structures. Data compression is a critical component of efficient storage and processing of information. The three major lossless compression methods that are investigated include minimal directed acyclic graph (minimal DAG) compression, grammar-based tree compression, and entropy-based compression. The analysis of the compression quality achieved by minimal DAG compression relies on analytic combinatorics, discrete methods, and complex analysis. This first compression technique is studied in detail and includes an efficient linear-time algorithm for constructing the minimal DAG of a given tree. The optimization of grammar-based tree compression relies solely on profitable heuristics, since it is NP-hard to find the smallest tree grammar for a given tree. Grammar-based tree compression is an extension of classical information-theoretic compression for strings. Naturally, in this way, we can obtain universally optimal tree compression algorithms for a variety of tree classes. This extension is formalized through tree straight-line programs, which can perform exponentially better than minimal DAG compression in certain cases. In recent years, the concept of entropy has been extended from strings to trees. This enables the development of information-theoretically optimal compression schemes based on probabilistic models of tree generation. Two such algorithms are studied in detail: an arithmetic coding based algorithm for search trees and increasing trees, which achieves average code lengths within a constant number of bits from the entropy, and the universal and asymptotically optimal hypersuccinct trees, a recent method that partitions trees into micro-trees, compresses them according to their frequency, and supports constant-time navigation within optimal space bounds.
en
Additional information:
Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers