Text-mining based incident identification in the domain of sustainability

Wieser, Florian

doi:10.34726/hss.2015.23298

Record link:

https://doi.org/10.34726/hss.2015.23298
http://hdl.handle.net/20.500.12708/4711

Title:

Text-mining based incident identification in the domain of sustainability

Citation:

Wieser, F. (2015). Text-mining based incident identification in the domain of sustainability [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2015.23298

reposiTUm DOI:

10.34726/hss.2015.23298

CatalogPlus:

AC12279269

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Wieser, Florian

Advisor:

Tjoa, A Min

Organisational Unit:

E188 - Institut für Softwaretechnik und Interaktive Systeme

Date (published):

2015

Number of Pages:

Keywords:

Text-Mining

Abstract:

Sustainability wurde ein immer populäreres Thema in den letzten Jahren, sicherlich bedingt dadurch, dass die Gesellschaft ein stärkeres Bewusstsein für Problematiken aus diesem Bereich entwickelt hat und auch ein Umdenken über Langzeitkonsequenzen, von unternehmerischen Tätigkeiten, sich in den Köpfen von den Entscheidungsträger in den Unternehmen festgesetzt hat. Sustainability hat auch eine Neudefinition erfahren, heute behandelt Sustainability auch soziale und ökonomische Anliegen und fokussiert sich nicht nur mehr auf ökologische Thematiken. Das Vernachlässigen von Nachhaltigkeitsthemen kann auch ein potentielles Risiko für Unternehmen darstellen. Deswegen befassen sich innerhalb von Unternehmen das Risikomanagement, unter anderem, mit der Identifikation und Beurteilung von potentiellen Umweltrisikofaktoren, welches nicht als eine triviale Aufgabe aufgefasst werden kann. In dieser Diplomarbeit liegt der Fokus darauf, ob es möglich ist, Environmental Sustainability Incidents innerhalb von Texten zu identifizieren. Die immer weite Verbreitung des World Wide Web hat dazu geführt, dass eine immer größere Anzahl an berichterstattenden Information verfügbar ist. Mögliche Quellen, in jenen relevante Informationen veröffentlicht werden können, sind Online News, Blogs oder Social Media Streams. Die Herausforderung hierbei ist es, die relevanten Informationen zu erkennen und zu extrahieren. Für eine Automatisierung der Identifikation von Sustainability Incidents, wurde eine Lösung mit der Hilfe von Data Mining Methoden umgesetzt. Dafür war es notwendig eine formale Definition von Environmental Incidents zu finden und diese in eine Natural Language Processing (NLP) Lösung zu überführen. Die entwickelte Lösung verwendet den State of the Art in NLP und verwendet als Textquellen den Content von Blogseiten welche im Umweltbereich publizieren. Das System funktioniert regelbasierend und identifiziert ob ein Umweltkontext vorhanden ist und erkennt möglichen grammatikalischen Beziehungen zu einem im Satz vorkommenden Unternehmen. Für die Erreichung einer besseren Performance, wurden fortgeschrittene Methoden wie Dependency Detection und Deep Learning Algorithmen verwendet. Nach einem iterativen Knowledge Engineering Ansatz, wurden verschiedene Setups für mögliche Lösungswege formuliert, diese wurden in Folge evaluiert und die Ergebnisse dokumentiert. Letztendlich schließt die Arbeit mit einer kritischen Betrachtung und ethnischen Bedenken in Bezug Data Mining. Des Weiteren wird auch noch die Problematik der Glaubwürdigkeit von online Medien adressiert.

Sustainability topics have become increasingly popular in recent years due to a growing awareness in society regarding sustainability as well because of a stronger awareness about the long-term consequences of business activities. Companies seem to care more about their corporate social responsibility, possibly due to a redefinition of sustainability, which may have induced more awareness. Nowadays, sustainability has a broader definition and concerns social and economic matters in addition to environmental ones. Sustainability issues can pose a risk for corporate success and, within companies, risk management departments are responsible for the identification and assessment of potential sustainability risks, which is not trivial a task. This thesis focuses on solving the problem of the identification of environmental sustainability incidents within text documents. The widespread availability of the World Wide Web has led to enormous growth in accessible and reported information all around the world and such possible relevant information could be published on online news portals, blogs, and social media streams. The challenge is to find and extract the important information from this enormous amount of data that increases every day. In order to automate the detection of sustainability incidents, a data mining approach was formulated. To enable the detection of environmental sustainability incidents, it was necessary to develop a formal definition of such incidents and map this definition to a natural language processing approach, which is suitable for an event identification within text. The system is developed by using the state of the art in natural language processing technologies. It gathers text sources from blog sites that publish in the environmental domain. The system works on a rule-based approach that identifies the presence of an environmental context and its possible relation to a company on a sentence level. Further, dependencies between words are examined and deep learning solutions are applied for sentiment classifications. Several set-ups are evaluated and the results are presented in this work. Finally, the thesis also concludes with a critical review of the ethical concerns in data mining and addresses credibility issues for online sources.

Additional information:

Abweichender Titel laut Übersetzung der Verfasserin/des Verfassers
Zsfassung in dt. Sprache

License:

In Copyright

Appears in Collections:

Thesis