<div class="csl-bib-body">
<div class="csl-entry">Urbanke, P. (2022). <i>A framework for evaluating the readability of test code in the context of code maintainability: A family of empirical studies</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2022.103606</div>
</div>
-
dc.identifier.uri
https://doi.org/10.34726/hss.2022.103606
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/137096
-
dc.description
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers
-
dc.description.abstract
Context and Motivation: Software testing is a common practice in software development and serves many functions. It provides certain guarantees that the software works as expected across the life cycle of the system, it helps with finding and fixing erroneous behaviour, it acts as documentation, provides usage examples, etc.. Still, test code is often treated as an orphan, which leads to poor quality tests also with respect to readability. However, if the test has poor readability, upstream activities like maintaining tests or drawing correct conclusions from tests may be compromised. But what is readable test code? Since test code has a different purpose than production code and contains exclusive features like assertion methods, the factors influencing readability may deviate from production code. Objective: We propose a framework, which can be used to evaluate the readability of test code. It also provides information on factors influencing readability and gives best-practice examples for improvements. Aside from this main goal, we give an overview on academic literature in the field of test code readability and compare it to opinions of practitioners. We investigate the impact of modifications, related to widely discussed readability factors, on the readability of test cases. Furthermore, we gather readability rating criteria from free text answers, investigate impact of developer experience on readability ratings and evaluate the accuracy of a readability rating tool, which is often used in other studies. Methods: We collect extensive information on test code readability by combining a systematic mapping of academic literature with the results of a systematic mapping of grey literature. We conduct a human-based experiment on test code readability with 77 mostly junior-level participants in academic context, to investigate various influence factors to readability. We categorise and group free text answers from the experiments participants and compare the human readability ratings with tool generated readability ratings. Finally, after the construction of the readability assessment framework, which is based on the previous results, we perform an evaluation and compare it to the results of the initial human-based experiment. Results: The literature studies result in 16 relevant sources from the scientific community and 56 sources from practitioners. From both literature mappings we see an ongoing interest in test code readability. Scientific sources focus on investigating automatically generated test code, which is often compared to manually written tests (88%). For capturing human readability, they primarily use surveys as methods (44%), which contain Likert scales in almost all cases. Grey literature (56 sources) mostly consists of blogs from practitioners, sharing their opinion and experience on problems found in their daily work. There is a clear intersection on readability factors discussed in both communities, but some factors are exclusive to each community. For the human-based experiment, we found statistical significant influence on the readability of test cases in five of ten investigated modifications, which map to readability factors. We do not see much influence of experience on readability ratings, although previous research found experience influencing understanding and maintenance tasks. Judging from the categorisation of around 2500 free text answers, the participants rate readability based on Test naming, Structure and Dependencies (i.e., does the test ensure only one behaviour?). The ratings of the readability rating tool are between the 0.25% and 0.75% quantile of our human ratings in around 51% of the investigated test cases. We also found influence of invisible differences in formatting (i.e. spaces, tabulators) affecting the tools ratings up to 0.25 on a scale from 0 to 1. The framework evaluation shows a decreased variation in the ratings across participants and increased rating speed compared to gut feeling ratings from the initial experiments. Overall, the framework rates tests to optimistically. Nevertheless, the validity is very limited, due to a small number of survey participants (5). Therefore, this evaluation is merely a concept, which we pursue in future work. Conclusion: From the literature mappings we found different views on test case readability between practitioners and academia, which come from the different contexts of the communities. The ratings from the readability tool are not accurate enough in order to trust them blindly. They still need to be complemented with human expertise. Our readability evaluation framework enables a more efficient assessment of readability. A large scale evaluation is planned for future work.
en
dc.language
English
-
dc.language.iso
en
-
dc.rights.uri
http://rightsstatements.org/vocab/InC/1.0/
-
dc.subject
test
en
dc.subject
code
en
dc.subject
readability
en
dc.subject
evaluation
en
dc.subject
quality
en
dc.subject
maintainability
en
dc.subject
mapping study
en
dc.subject
grey literature
en
dc.title
A framework for evaluating the readability of test code in the context of code maintainability: A family of empirical studies
en
dc.title.alternative
Ein Framework für die Qualitätsbeurteilung von Test Code für die Wartung von Software Code: Eine Familie von empirischen Studien
de
dc.type
Thesis
en
dc.type
Hochschulschrift
de
dc.rights.license
In Copyright
en
dc.rights.license
Urheberrechtsschutz
de
dc.identifier.doi
10.34726/hss.2022.103606
-
dc.contributor.affiliation
TU Wien, Österreich
-
dc.rights.holder
Pirmin Urbanke
-
dc.publisher.place
Wien
-
tuw.version
vor
-
tuw.thesisinformation
Technische Universität Wien
-
dc.contributor.assistant
Biffl, Stefan
-
tuw.publication.orgunit
E194 - Institut für Information Systems Engineering
-
dc.type.qualificationlevel
Diploma
-
dc.identifier.libraryid
AC16727234
-
dc.description.numberOfPages
113
-
dc.thesistype
Diplomarbeit
de
dc.thesistype
Diploma Thesis
en
dc.rights.identifier
In Copyright
en
dc.rights.identifier
Urheberrechtsschutz
de
tuw.advisor.staffStatus
staff
-
tuw.assistant.staffStatus
staff
-
tuw.advisor.orcid
0000-0002-4743-3124
-
tuw.assistant.orcid
0000-0002-3413-7780
-
item.grantfulltext
open
-
item.openairecristype
http://purl.org/coar/resource_type/c_bdcc
-
item.mimetype
application/pdf
-
item.openairetype
master thesis
-
item.openaccessfulltext
Open Access
-
item.languageiso639-1
en
-
item.cerifentitytype
Publications
-
item.fulltext
with Fulltext
-
crisitem.author.dept
E194 - Institut für Information Systems Engineering