Parameterization-free clustering with sparse data observers

Iglesias Vázquez, Félix; Zseby, Tanja; Zimek, Arthur

doi:10.1016/j.is.2025.102562

Record link:

http://hdl.handle.net/20.500.12708/216068

Title:

Parameterization-free clustering with sparse data observers

Citation:

Iglesias Vázquez, F., Zseby, T., & Zimek, A. (2025). Parameterization-free clustering with sparse data observers. Information Systems, 133, Article 102562. https://doi.org/10.1016/j.is.2025.102562

Publisher DOI:

10.1016/j.is.2025.102562

Publication Type:

Article - Original Research Article

Language:

English

Authors:

Iglesias Vázquez, Félix
Zseby, Tanja
Zimek, Arthur

Organisational Unit:

E389-01 - Forschungsbereich Networks

Journal:

Information Systems

ISSN:

0306-4379

Date (published):

Aug-2025

Number of Pages:

Publisher:

PERGAMON-ELSEVIER SCIENCE LTD

Peer reviewed:

Yes

Keywords:

Clustering; Sparse data observers; Unsupervised learning; Data Analysis

Abstract:

Given a set of data points, clustering serves to discover groups based on pairwise similarities and the shapes drawn by the data in the feature space. In other words, it is a tool to describe data and reveal their intrinsic nature in terms of patterns or groups. In this paper, we review the methodology of clustering when used to explore a priori unknown data, i.e., we do not know how data spaces are manipulated, how algorithms are tuned, and how results are validated. Under this practical approach, we examine the advantages of SDOclust, a clustering method that stands out for its simplicity, lightness, no need for parameterization and not being subject to traditional clustering limitations. We test SDOclust and main established alternatives — HDBSCAN, -means--, Fuzzy C-means, Hierarchical Clustering, CLASSIX, and N2D Deep Clustering — by extensive experimentation with more than 200 datasets, both real and synthetic, that have been collected from the literature on evaluation and represent different data analysis challenges. We submit only SDOclust to unfavorable testing conditions by denying it a parameter tuning phase. Nevertheless, its overall performance is excellent and positions it as one of the best general-purpose alternatives. With deep clustering as the consolidation of a new paradigm, trends in clustering consist mainly in projecting data into spaces that are easier to dissect. Therefore, in cases where the original space does not show clustering-friendly structures and when we can assume transformation costs, SDOclust easily adapts and is a most natural choice to perform the partitioning task.

Link (external):

https://doi.org/10.48436/rnf34-61z36
https://github.com/CN-TU/pysdoclust

Research Areas:

Mathematical and Algorithmic Foundations: 35%
Computer Engineering and Software-Intensive Systems: 20%
Information Systems Engineering: 45%

Science Branch:

1020 - Informatik: 20%
2020 - Elektrotechnik, Elektronik, Informationstechnik: 60%
1010 - Mathematik: 20%

Appears in Collections:

Article

Show full item record

Page view(s)

checked on Jun 11, 2025

Download(s)

checked on Jun 11, 2025

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM