Abstract
Outlier detection plays a vital role in data cleaning, significantly impacting analysis and decision-making processes. Although extensive research has been conducted on numerical outlier detection, identifying outliers in relational data with categorical attributes presents unique challenges due to the complexity of defining an appropriate similarity measure. Existing methods often rely on converting categorical values to numerical ones, using frequency as an indicator of outlierness, and extracting predefined syntactic structures from the values.In this paper, we introduce FIONA (FInding Outliers iN Attributes), a novel approach designed to detect outliers in relational data with categorical values in distributed scenarios. Usually, categorical values in relational datasets exhibit specific syntactic structures. FIONA establishes a similarity measure to uncover hidden patterns and identify dominant patterns within the data. Values that deviate from these dominant patterns are reported as outliers.Compared to existing tools, FIONA excels in accurately identifying outliers and dominant patterns in distributed datasets and offers clear and concise explanations for each identified outlier. Our approach ensures that distributed systems can maintain data integrity and reliability, enhancing the effectiveness of datadriven decision-making in large-scale, distributed environments.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2025 IEEE 45th International Conference on Distributed Computing Systems Workshops, ICDCSW 2025 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 333-338 |
| Number of pages | 6 |
| ISBN (Electronic) | 9798331517250 |
| DOIs | |
| Publication status | Published - 2025 |
| Event | 45th IEEE International Conference on Distributed Computing Systems Workshops, ICDCSW 2025 - Glasgow, United Kingdom Duration: 20 Jul 2025 → 23 Jul 2025 |
Publication series
| Name | Proceedings - 2025 IEEE 45th International Conference on Distributed Computing Systems Workshops, ICDCSW 2025 |
|---|
Conference
| Conference | 45th IEEE International Conference on Distributed Computing Systems Workshops, ICDCSW 2025 |
|---|---|
| Country/Territory | United Kingdom |
| City | Glasgow |
| Period | 20/07/25 → 23/07/25 |
Bibliographical note
Publisher Copyright:© 2025 IEEE.
Keywords
- Categorical outliers
- generalization tree
- patterns
- similarity measures
- syntactic structure
Fingerprint
Dive into the research topics of 'FIONA: Detecting Outliers in Attributes of Relational Datasets with Categorical Values'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver