Pölz, D. (2008). Evaluation of selected machine learning methods for detecting phishing e-mail [Diploma Thesis, Technische Universität Wien]. reposiTUm. http://hdl.handle.net/20.500.12708/178390
The thesis deals with the problem of detecting phishing e-mails.<br />In con- trast to spam e-mails, phishing e-mails are designed to look very similar to legitimate e-mails which makes them hard to detect.<br />My main interest is to investigate the idea of finding phishing e-mails with machine learning algorithms based on a feature set which was deter- mined in previous work.<br />First, the thesis gives an overview of the phishing situation and conveys background information about the test data used.<br />Then I review five important machine learning algorithms:<br /> Decision trees , neural networks , naive bayes , k nearest neighbor and support vector machines For each of this algorithms the thesis explains how it basically works.<br />Then it shows how it can be applied to the phishing detection problem.<br />For experiments I apply the widely used data mining program. This process is documented in detail and the most important parameter options are explained.<br />Furthermore I try to improve the results of the regular machine learning algorithms by applying combination techniques like bagging, boosting and stacking.<br />The final chapter compares the results achieved for the phishing detec- tion problem with each of the five algorithm for various parameter settings.<br />Moreover it explains why there are differences in the results and what are advantages and disadvantages of the five algorithms concerning available data, run time and prediction accuracy.<br />