Evaluation of selected machine learning methods for detecting phishing e-mail

Pölz, David

Record link:

http://hdl.handle.net/20.500.12708/178390

Title:

Evaluation of selected machine learning methods for detecting phishing e-mail

Citation:

Pölz, D. (2008). Evaluation of selected machine learning methods for detecting phishing e-mail [Diploma Thesis, Technische Universität Wien]. reposiTUm. http://hdl.handle.net/20.500.12708/178390

CatalogPlus:

AC05038141

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Pölz, David

Advisor:

Gansterer, Wilfried

Date (published):

2008

Number of Pages:

Keywords:

Phishing; e-mail; Klassifizierung; maschinelles Lernen; Algorithmen

phishing; e-mail; classification; machine learning; algorithms; evaluation

Abstract:

The thesis deals with the problem of detecting phishing e-mails. In con- trast to spam e-mails, phishing e-mails are designed to look very similar to legitimate e-mails which makes them hard to detect. My main interest is to investigate the idea of finding phishing e-mails with machine learning algorithms based on a feature set which was deter- mined in previous work. First, the thesis gives an overview of the phishing situation and conveys background information about the test data used. Then I review five important machine learning algorithms: Decision trees , neural networks , naive bayes , k nearest neighbor and support vector machines For each of this algorithms the thesis explains how it basically works. Then it shows how it can be applied to the phishing detection problem. For experiments I apply the widely used data mining program. This process is documented in detail and the most important parameter options are explained. Furthermore I try to improve the results of the regular machine learning algorithms by applying combination techniques like bagging, boosting and stacking. The final chapter compares the results achieved for the phishing detec- tion problem with each of the five algorithm for various parameter settings. Moreover it explains why there are differences in the results and what are advantages and disadvantages of the five algorithms concerning available data, run time and prediction accuracy.

Appears in Collections:

Thesis

Show full item record

Page view(s)

checked on Dec 1, 2023

Google Scholar^TM

Check

Page view(s)

Google ScholarTM

Google Scholar^TM