Predicting bugs in source code : A machine learning approach for predicting faults by utilizing code and change metrics

Felder, Jodok

doi:10.34726/hss.2025.124198

DC Field

Value

Language

dc.contributor.advisor

Weippl, Edgar

dc.contributor.author

Felder, Jodok

dc.date.accessioned

2025-03-18T10:16:54Z

dc.date.issued

2025

dc.date.submitted

2025-02

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Felder, J. (2025). <i>Predicting bugs in source code : A machine learning approach for predicting faults by utilizing code and change metrics</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.124198</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2025.124198

dc.identifier.uri

http://hdl.handle.net/20.500.12708/213269

dc.description

Zusammenfassung in deutscher Sprache

dc.description.abstract

Bug detection plays a critical role in software engineering, offering significant time and cost savings for organizations and developers alike. With the exponential growth in code volume and the availability of data surrounding its development, bug prediction has become increasingly important. This thesis focuses on combining code metrics, especially ones based on code changes, and machine learning techniques to address the challenge of identifying buggy software files.The thesis leverages a dataset comprising 34 open-source projects and utilizes more than 37 code metrics, ranging from basic measures such as Lines of Code to advanced metrics rooted in object-oriented programming principles. A CatBoost classifier was employed to develop a predictive model capable of classifying files as buggy or non-buggy and assigning a corresponding Risk Score -- a numerical indicator of the likelihood that a given file contains bugs. The model achieved an average accuracy of 84.1% and a recall rate of 83%, demonstrating its reliability and effectiveness in identifying buggy files.The analysis further examined the importance of individual code metrics in driving the model's predictions. Feature Importance Analysis identified complexity metrics and the Bus Factor as the most influential in predicting buggy files, offering valuable insights into key contributors to software quality. Additionally, a Logistic Regression-based approach, which achieved an accuracy of 61%, was evaluated to contrast its performance with advanced non-linear models like CatBoost, demonstrating the latter's superior predictive capabilities for bug prediction.This work contributes to the field of software engineering by demonstrating the efficacy of combining machine learning with metric-driven approaches for bug prediction. The results provide a foundation for future research and practical applications aimed at enhancing software reliability and development efficiency.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

Machine Learning

dc.subject

Bug Prediction

dc.subject

Code Metrics

dc.subject

Change Metrics

dc.subject

Gradient Boosting

dc.subject

CatBoost

dc.subject

Classification

dc.title

Predicting bugs in source code : A machine learning approach for predicting faults by utilizing code and change metrics

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2025.124198

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Jodok Felder

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

dc.contributor.assistant

Schatten, Alexander

tuw.publication.orgunit

E194 - Institut für Information Systems Engineering

dc.type.qualificationlevel

Diploma

dc.identifier.libraryid

AC17467640

dc.description.numberOfPages

dc.thesistype

Diplomarbeit

dc.thesistype

Diploma Thesis

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.assistant.staffStatus

staff

item.openairecristype

http://purl.org/coar/resource_type/c_bdcc

item.grantfulltext

open

item.cerifentitytype

Publications

item.openairetype

master thesis

item.mimetype

application/pdf

item.languageiso639-1

item.fulltext

with Fulltext

item.openaccessfulltext

Open Access

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(1.27 MB)

In Copyright

Show simple item record

Page view(s)

209

checked on Mar 18, 2025

Download(s)

checked on Mar 18, 2025

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM