E105 - Institut für Stochastik und Wirtschaftsmathematik
Number of Pages:
ROC curve; Classification; Multivariate marker; R package
Binary classification is a very common problem whose objective is to correctly determine whether or not a subject has one characteristic of interest. On the basis of a gold standard, the objective is to discriminate between two populations (positive and negative, depending on having or not the characteristic of interest, respectively) by means of a variable, so-called marker. In any binary categorization, there exist two types of error: classifying a negative subject as a positive (false positive) and classifying a positive subject as a negative (false negative). The probabilities of those errors are determined by the complementary of the specificity (or false-positive rate), and the complementary of the sensitivity (or false-negative rate), respectively. The trade-off between the sensitivity (y-axis) and the complementary of the specificity (x-axis) is reflected in the receiver operating characteristic (ROC) curve. This statistical graphical method is therefore used to measure and visualize the discrimination performance of the marker under study. The classification accuracy is frequently summarized by the area under the curve (AUC), but the underlying classification rules are rarely exhibited since in the standard configuration the decision rules are immediately determined. However, the available information may not immediately discriminate between the two populations, and therefore the decision criterion is not direct. In such case, different dichotomization criteria should be explored, giving rise to a classification subset.The main goal of this dissertation is to revisit the definition of the ROC curve in order to graphically analyze the discriminatory capacity of a continuous marker when alternative rules to perform a binary classification are considered. It covers different shapes for the classification regions (defined as those where a subject is classified as a positive if their marker value is inside), as well as flexibility on the nature of the marker under study. On this basis, classification accuracy for multivariate markers may be directly assessed. Graphical representations to reflect different types of classification rules and to display the construction of the resulting ROC curve are studied, with the ultimate aim of elucidating the underlying decision rules and preserving their interpretability, if appropriate.