Method of predicting quality variables based on process parameters of a paper machine using industrial data science

Stockert, Anton-Alexander

doi:10.34726/hss.2018.55965

Record link:

https://doi.org/10.34726/hss.2018.55965
http://hdl.handle.net/20.500.12708/19617

Title:

Method of predicting quality variables based on process parameters of a paper machine using industrial data science

Citation:

Stockert, A.-A. (2018). Method of predicting quality variables based on process parameters of a paper machine using industrial data science [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2018.55965

reposiTUm DOI:

10.34726/hss.2018.55965

CatalogPlus:

AC15242935

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Stockert, Anton-Alexander

Advisor:

Sihn, Wilfried

Organisational Unit:

E330 - Institut für Managementwissenschaften

Date (published):

2018

Number of Pages:

172

Keywords:

Papierindustrie; Industrial Data Science; Qualitätsprädiktion

Paper industry; industrial data science; quality prediction

Abstract:

Modern paper machines are continuously recording and storing process and quality data. Based on the generated data of the process parameters and using an industrial data science methodology this thesis develops prediction models for forecasting the quality of the produced paper. The reliability of such models is determined by statistical indicators (squared correlation R² and Root Mean Squared Error RMSE), the plausibility and applicability is validated and subsequently suggestions for improvement and extensive implementation are provided. The methodology proposed for this purpose is based on the Harvard data science approach and consists of two different modelling paths - regression modelling for individual quality aspects and classification modelling for overall quality. Furthermore, different dimensions of data sets underlying the modelling (in terms of number of variables) are compared. A distinction is made between complete, evolutionary selected and manually selected data sets. Both, for the parameterization of the used prediction models as well as for the parameterization of the evolutionary selection algorithms, optimization possibilities are described and carried out. For the regression models, a 'Deep Learning' algorithm achieves the best results with regard to RMSE (ca. 2.5%) and squared correlation (ca. 0.7). On the basis of a calculated, summarizing quality indicator, the author furthermore creates a Random Forest Classification Model (accuracy 59.9%) to complement the more detailed regression models discussed for the individual quality aspects - however, this modelling path is only implemented as an example and is not subjected to further optimization. The results suggest that close cooperation between process engineers and data scientists allow models that enable a machine to achieve quality goals in a resource-saving manner, to approach special requirements for quality aspects faster and more specifically, and to advance the automation of production. Furthermore, the results show that the more precise models are achieved by using condensed data sets. The results show, that the data selected by experts achieve similarly good, in some cases even better, results than the evolutionary data sets.

In modernen Papiermaschinen werden laufend Prozessund Qualitäts-Daten aufgenommen und gespeichert. Basierend auf den generierten Daten der Prozessparameter und mithilfe einer Industrial Data Science Methodologie erstellt diese Diplomarbeit Prädiktionsmodelle, um die Qualität des produzierten Papieres vorherzusagen. Die Zuverlässigkeit solcher Modelle wird anhand statistischer Kennzahlen festgestellt (Bestimmtheitsmaß R² und Root Mean Squared Error RMSE), die Plausibilität bzw. Anwendbarkeit validiert und in weiterer Folge Vorschläge zur Verbesserung sowie einer umfangreichen Implementierung geliefert. Die für diesen Zweck vorgeschlagene Methodologie basiert auf dem Harvard Data Science Approach und besteht aus zwei verschiedenen Modellierungspfaden – einer Regressionsmodellierung für einzelne Qualitätsaspekte und einer Klassifikationsmodellierung für die Gesamtqualität. Des Weiteren, werden verschiedene Umfänge von Datensätzen (in Bezug auf Variablenanzahl), welche der Modellierung zugrunde liegen, verglichen. Dabei wird zwischen vollständigen, evolutionär selektierten und manuell selektierten Datensätzen unterschieden. Sowohl für die Parametrisierung der verwendeten Prädiktionsmodelle, sowie für die der evolutionären Selektionsalgorithmen, werden Optimierungsmöglichkeiten beschrieben und durchgeführt. Bei den Regressionsmodellen erzielt ein ‚Deep Learning‘-Algorithmus die besten Ergebnisse in Hinsicht auf RMSE (ca. 2,5%) und Bestimmtheitsmaß (ca. 0,7). Mithilfe einer errechneten, zusammenfassenden Qualitätskennzahl erstellt der Autor zusätzlich ein Random-Forest-Klassifikationsmodell, um eine Ergänzung zu den ausführlicher besprochenen Regressionsmodellen der einzelnen Qualitätsaspekte aufzuzeigen (Genauigkeit 59,9%) – dieser Modellierungspfad wird allerdings nur exemplarisch umgesetzt und keiner weiteren Optimierung unterzogen. Die Resultate lassen den Rückschluss zu, dass eine enge Kooperation zwischen Prozessingenieuren und Data-Scientists Modelle ermöglicht, welche Maschinen befähigen, möglichst ressourcenschonend Qualitätsziele zu erreichen, Spezialanforderungen an Qualitätsaspekte schneller und gezielter anzufahren, sowie die Automatisierung der Produktion voranzutreiben. Darüber hinaus zeigen die Ergebnisse, dass die präziseren Modelle erzielt werden, indem gekürzte Datensätze verwendet werden. Im Ergebnis sich, dass die von Experten selektierten Daten ähnlich gute, sogar teilweise bessere, Ergebnisse erzielen als die evolutionär selektierten Datensätze.

License:

In Copyright

Appears in Collections:

Thesis