Abstract
Dimensionality reduction techniques, also called projections, are one of the main tools for visualizing high-dimensional data. To compare such techniques, several quality metrics have been proposed. However, such metrics may not capture the visual separation among groups/classes of samples in a projection, i.e., having groups of similar (same label) points far from other (distinct label) groups of points. For this, we propose a pseudo-labeling mechanism to assess visual separation using the performance of a semi-supervised optimum-path forest classifier (OPFSemi), measured by Cohen’s Kappa. We argue that lower label propagation errors by OPFSemi in projections are related to higher data/visual separation. OPFSemi explores local and global information of data distribution when computing optimum connectivity between samples in a projection for label propagation. It is parameter-free, fast to compute, easy to implement, and generically handles any high-dimensional quantitative labeled dataset and projection technique. We compare our approach with four commonly used scalar metrics in the literature for 18 datasets and 39 projection techniques. Our results consistently show that our proposed metric consistently scores values in line with the perceived visual separation, surpassing existing projection-quality metrics in this respect.
Original language | English |
---|---|
Pages (from-to) | 287-297 |
Number of pages | 11 |
Journal | Computers & Graphics |
Volume | 116 |
Early online date | 19 Aug 2023 |
DOIs | |
Publication status | Published - Nov 2023 |
Bibliographical note
Publisher Copyright:© 2023 Elsevier Ltd
Funding
The authors acknowledge CAPES grants with Finance Code 001, FAPESP grants #2014/12236-1 , #2019/10705-8 , #2022/12668-5 , and CNPq grants #303808/2018-7 . We also acknowledge Carlijne Govers, Utrecht University, for organizing the user study presented in Section 5.1 .
Funders | Funder number |
---|---|
Fundação de Amparo à Pesquisa do Estado de São Paulo | 2019/10705-8, 2014/12236-1, 2022/12668-5 |
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior | |
Conselho Nacional de Desenvolvimento Científico e Tecnológico | 303808/2018-7 |
Keywords
- Quality of projections
- Labeled data
- Pseudo labeling