Theoretical Computer Science; General Computer Science
Binary rewriting is changing the semantics of a program without having the source code at hand. It is used for diverse purposes, such as emulation (e.g., QEMU), optimization (e.g., DynInst), observation (e.g., Valgrind), and hardening (e.g., Control flow integrity enforcement). This survey gives detailed insight into the development and state-of-the-art in binary rewriting by reviewing 67 publications from 1966 to 2018. Starting from these publications, we provide an in-depth investigation of the challenges and respective solutions to accomplish binary rewriting. Based on our findings, we establish a thorough categorization of binary rewriting approaches with respect to their use-case, applied analysis technique, code-transformation method, and code generation techniques. We contribute a comprehensive mapping between binary rewriting tools, applied techniques, and their domain of application. Our findings emphasize that although much work has been done over the past decades, most of the effort was put into improvements aiming at rewriting general purpose applications but ignoring other challenges like altering throughput-oriented programs or software with real-time requirements, which are often used in the emerging field of the Internet of Things. To the best of our knowledge, our survey is the first comprehensive overview on the complete binary rewriting process.