General Data Protection Regulation (GDPR) was enacted in Europe, on May 25th 2018, to offer new rights and protections to people concerning their personal data. It grants several rights to the EU citizens including the right to access, right to rectification, right to be forgotten, right to object, and right to data portability. GDPR also assigns responsibilities to companies that collect and process personal data. These include seeking explicit consent before using personal data, notifying data breaches within 72 hours of discovery, maintaining records of processing activities, among others. Failing to comply with GDPR could result in hefty penalties. For instance, in January 2019, Google was fined €50M for lacking customer consent in their ads personalization; in July 2019, British Airways was fined £184M for failing to safeguard personal data of their customers.
Personal Data Ecosystem
GDPR recognizes four entities that interact with personal data: data subject, the person whose personal data is collected; controller, the entity that collects and uses personal data; processor, the entity that processes personal data on behalf of a data controller; and finally, supervisory authority (one per EU country) to oversee that the rights and responsibilities of GDPR are complied with.
To illustrate this ecosystem, consider the music streaming company Spotify collecting its customer's listening history, and then using Google cloud's services to identify new recommendations for customers. In this case, Spotify is the data controller and Google Cloud is the data processor. Spotify could also engage with other data controllers, say SoundCloud to gather additional personal data of their customers. As the figure shows, any person could go to a controller to exercise their GDPR rights, and reach out to supervisory authorities to report violations.
Impact on Storage & Database Systems
Our focus on storage and database systems is motivated by the high proportion of GDPR articles that concern them. From out of the 99 GDPR articles, 31 govern the behavior of data storage systems. In contrast, only 11 describe requirements from compute and network infrastructure. This is not surprising as GDPR heavily focuses on the control-plane aspects of personal data (like collecting, securing, storing, moving, sharing, deleting etc.,) than the actual processing of it. This allows us transform the high-level organizational GDPR figure (above) into one that puts database at the center. We then analyze GDPR from a database systems perspective, translating its legal articles into a set of capabilities and characteristics that compliant systems must support. Our research makes several key observations and findings:
Our findings indicate that GDPR significantly impacts the design and operation of modern database systems. Unfortunately, we lack systematic approaches to gauge the magnitude of changes required, and the associated performance impact. Toward solving these challenges, we design a new benchmark called GDPRbench. Explore more!