The ever increasing amount of data that the modern internet society produces poses challenges to corporations and information systems that need to store and process this data. In addition, novel trends like the internet of things even adumbrate a prospectively steeper increase of the data volume than in the past, thereby supporting the relevance of big data. In order to overcome the gap between storage capacity and data access speed while maintaining the economic feasibility of data processing, the industry has created frameworks that allow the horizontal scaling of data processing on large clusters of commodity hardware. The plethora of technologies that have since been developed makes the entrance to the field of big data processing increasingly hard. Therefore, this thesis identifies the major types of big data processing along with the programming models that have been designed to cover them. In addition, an introductory overview of the most important open source frameworks and technologies along with practical examples of how they can be used is given for each processing type. The thesis concludes by pointing out important extension projects to the presented base systems and by suggesting the conduction of a performance-centric comparison of Apache Spark and Apache Hadoop that can help to establish a more profound understanding of the nature of these systems and to identify potential novel research topics.
en
Additional information:
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers