The F4U system for understanding the effects of data quality

Daniele Foroni, Matteo Lissandrini, Yannis Velegrakis

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    Abstract

    We demonstrate a system that enables a data-centric approach in understanding data quality. Instead of directly quantifying data quality as traditionally done, it disrupts the quality of the dataset and monitors the deviations in the output of an analytic task at hand. It computes the correlation factor between the disruption and the deviation and uses it as the quality metric. This allows users to understand not only the quality of their dataset but also the effect that present and future quality issues have to the intended analytic tasks. This is a novel data-centric approach aimed at complementing existing solutions. On top of the new information that it provides, and in contrast to existing techniques of data quality, it neither requires knowledge of the clean datasets, nor of the constraints on which the data should comply.

    Original languageEnglish
    Title of host publication2021 IEEE 37th International Conference on Data Engineering (ICDE)
    PublisherIEEE
    Pages2717-2720
    Number of pages4
    ISBN (Electronic)978-1-7281-9184-3
    ISBN (Print)978-1-7281-9185-0
    DOIs
    Publication statusPublished - 22 Jun 2021
    Event37th IEEE International Conference on Data Engineering, ICDE 2021 - Virtual, Chania, Greece
    Duration: 19 Apr 202122 Apr 2021

    Publication series

    NameProceedings - International Conference on Data Engineering
    Volume2021-April
    ISSN (Print)1084-4627

    Conference

    Conference37th IEEE International Conference on Data Engineering, ICDE 2021
    Country/TerritoryGreece
    CityVirtual, Chania
    Period19/04/2122/04/21

    Bibliographical note

    Publisher Copyright:
    © 2021 IEEE.

    Keywords

    • Data Cleaning
    • Data Mining
    • Data Profiling
    • Data Quality

    Fingerprint

    Dive into the research topics of 'The F4U system for understanding the effects of data quality'. Together they form a unique fingerprint.

    Cite this