Enhanced methods of job offers mining from the World Wide Web

Dar, Ehtesham-Ul-Haq

doi:10.34726/hss.2018.23028

Record link:

https://doi.org/10.34726/hss.2018.23028
http://hdl.handle.net/20.500.12708/6102

Title:

Enhanced methods of job offers mining from the World Wide Web

Citation:

Dar, E.-U.-H. (2018). Enhanced methods of job offers mining from the World Wide Web [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2018.23028

reposiTUm DOI:

10.34726/hss.2018.23028

CatalogPlus:

AC15032690

Publication Type:

Thesis - Dissertation

Language:

English

Authors:

Dar, Ehtesham-Ul-Haq

Advisor:

Dorn, Jürgen

Organisational Unit:

E188 - Institut für Softwaretechnik und Interaktive Systeme

Date (published):

2018

Number of Pages:

116

Keywords:

Datenextraktion; Ontologie; Web Crawler

Data Extraction; Ontology; Web Crawler

Abstract:

The significance of employment in the setup of a society is quite evident. The methods of employment procurement are gradually changing from conventional to digital. The internet has become a prominent source of job procurement. Online job offers opened the research opportunities to explore different methods for the automation of online jobs classification and retrieval. Classification of web documents as job opportunities required a mechanism from Machine Learning or some other domain. To automate the retrieval of online job opportunities, text classification is an only viable method - in case of machine learning. The Semantic web mining is also a possible solution for job offers classification. We studied different methods for job offers classification, from machine learning and semantic web technologies. More than 5000 job offers were collected from multiple existing job offer websites for this study. From machine learning discipline, we investigated eight text classifiers to study their effectiveness and generalization performance on new data. Job offers dataset is preprocessed with different available methods and a newly defined method, and arranged into five groups for classification. Classifiers are regularized to avoid high variance, and their effectiveness parameters and generalization errors were evaluated. All the classifiers showed more than 90% accuracy but generalization errors varied. Ridge Regression and Stochastic Gradient Decent generalized well on new data, for all the groups. On the contrary Random Forest and Perceptron tenacious toward high variance. We found two classifiers that generalized well to new data. Remaining classifiers exhibited both behaviors, according to a group. From semantic web technology, we proposed a scalable ontology-based classifier. This enhanced classifier classify generic as well as specific job offers. We used the ontology to: extract concepts from job offers text description, find the minimum threshold for classification, and developed a classification model. We did not use any Machine Learning algorithm to develop this classifier. We evaluated this classifier according to Machine learning evaluation mode - training, and testing dataset. Our classifier showed more than 90% accuracy, precision, and recall, for both training and testing dataset. With these promising results of the defined methods, we can automate the job offers categorization and retrieval from the World Wide Web.

Additional information:

Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers

License:

In Copyright

Appears in Collections:

Thesis