Heil, M. (2014). Introducing predictive analytics for decision support in the cultural domain [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2014.23800
E188 - Institut für Softwaretechnik und Interaktive Systeme
-
Date (published):
2014
-
Number of Pages:
141
-
Abstract:
Data mining is an effective means to extract useful meaning from the myriads of data that has accumulated over the years. As the base of the Big Data-hype, it is increasingly common in companies, but often fails due to its high complexity or excessive expectations. To counter this situation by reducing the necessity of expertise and intuition required for conducting such projects, a methodology shall be developed by the means of a feasibility study. This study is intended to answer different forecasting questions using sales data of a cultural establishment in the German speaking area. These questions include predicting the occupancy rate of single events, determining "typical" sales pathway patterns and predicting the general success as well as daily sales figures of productions. Firstly and according to the CRISP-DM process model, the understanding of the data as well as their business context are addressed. Additionally, data needs to be prepared in general before it can be used in our data mining context. The domain related questions are answered sequentially by reusing insights gained and data structures prepared. Further, various evaluations are conducted by comparing different solution approaches and configurations. However, a comparison with a conventional forecast method can be drawn for the occupancy rate prediction of events only, as there have been no such efforts concerning the other questions. From a methodological point of view different possibilities to solve problems are unveiled and aspects that need to be taken into consideration are pointed out. In the end, the work is intended to facilitate getting the feel of the working principles and eventually being able to reproduce the process in a different environment, even different from the cultural context. The tool used for almost all tasks is Weka, which is open source, offering a great flexibility and an appropriate range of functions. When it comes to the results of this feasibility study, it is demonstrated that on the one hand, data mining is suitable and on the other hand, the data available is sufficient to yield useful results for the bigger part of the domain questions. The conventional approach to forecast events is surpassed by providing a solution that is up to 11% more accurate on average depending on the horizon. Further, simple and classic approaches to forecast time series are outclassed by the ML-NARX approach proposed especially for projecting sales figures of productions. Taken as a whole, many new insights are gained, but also several deficiencies are encountered. Most of them are due to deliberate interferences caused by marketing measures, sales to key accounts and partner companies, which lack granularity and the general scarcity of data. In addition to this, manifestations of a phenomenon called "concept drift" are experienced.
en
Additional information:
Abweichender Titel laut Übersetzung der Verfasserin/des Verfassers Zsfassung in dt. Sprache. - Literaturverz. S. 137 - 141