Abstract
The General Data Protection Regulation (GDPR) grants all natural persons the right to access their personal data if
this is being processed by data controllers. The data controllers are obliged to share the data in an electronic format and often
provide the data in a so called Data Download Package (DDP). These DDPs contain all data collected by public and private
entities during the course of a citizens’ digital life and form a treasure trove for social scientists. However, the data can be
deeply private. To protect the privacy of research participants while using their DDPs for scientific research, we developed a deidentification algorithm that is able to handle typical characteristics of DDPs. These include regularly changing file structures,
visual and textual content, differing file formats, differing file structures and private information like usernames. We in
this is being processed by data controllers. The data controllers are obliged to share the data in an electronic format and often
provide the data in a so called Data Download Package (DDP). These DDPs contain all data collected by public and private
entities during the course of a citizens’ digital life and form a treasure trove for social scientists. However, the data can be
deeply private. To protect the privacy of research participants while using their DDPs for scientific research, we developed a deidentification algorithm that is able to handle typical characteristics of DDPs. These include regularly changing file structures,
visual and textual content, differing file formats, differing file structures and private information like usernames. We in
Original language | English |
---|---|
Pages | 101-120 |
DOIs | |
Publication status | Published - 2021 |
Keywords
- de-identification
- anonymization
- pseudonymization
- Data Download Package