Skip to main navigation Skip to search Skip to main content

FIONA: Detecting Outliers in Attributes of Relational Datasets with Categorical Values

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Outlier detection plays a vital role in data cleaning, significantly impacting analysis and decision-making processes. Although extensive research has been conducted on numerical outlier detection, identifying outliers in relational data with categorical attributes presents unique challenges due to the complexity of defining an appropriate similarity measure. Existing methods often rely on converting categorical values to numerical ones, using frequency as an indicator of outlierness, and extracting predefined syntactic structures from the values.In this paper, we introduce FIONA (FInding Outliers iN Attributes), a novel approach designed to detect outliers in relational data with categorical values in distributed scenarios. Usually, categorical values in relational datasets exhibit specific syntactic structures. FIONA establishes a similarity measure to uncover hidden patterns and identify dominant patterns within the data. Values that deviate from these dominant patterns are reported as outliers.Compared to existing tools, FIONA excels in accurately identifying outliers and dominant patterns in distributed datasets and offers clear and concise explanations for each identified outlier. Our approach ensures that distributed systems can maintain data integrity and reliability, enhancing the effectiveness of datadriven decision-making in large-scale, distributed environments.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE 45th International Conference on Distributed Computing Systems Workshops, ICDCSW 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages333-338
Number of pages6
ISBN (Electronic)9798331517250
DOIs
Publication statusPublished - 2025
Event45th IEEE International Conference on Distributed Computing Systems Workshops, ICDCSW 2025 - Glasgow, United Kingdom
Duration: 20 Jul 202523 Jul 2025

Publication series

NameProceedings - 2025 IEEE 45th International Conference on Distributed Computing Systems Workshops, ICDCSW 2025

Conference

Conference45th IEEE International Conference on Distributed Computing Systems Workshops, ICDCSW 2025
Country/TerritoryUnited Kingdom
CityGlasgow
Period20/07/2523/07/25

Bibliographical note

Publisher Copyright:
© 2025 IEEE.

Keywords

  • Categorical outliers
  • generalization tree
  • patterns
  • similarity measures
  • syntactic structure

Fingerprint

Dive into the research topics of 'FIONA: Detecting Outliers in Attributes of Relational Datasets with Categorical Values'. Together they form a unique fingerprint.

Cite this