GDPRbench is an open-source benchmark that represents the functionalities of a database system deployed by a company that collects and processes personal data. GDPR significantly affects the design and operation of database systems that hold personal data. Yet, existing benchmarks like TPC and YCSB do not recognize the abstraction of personal data, including its legal and interfacing requirements. We design and implement GDPRbench after carefully analyzing the GDPR articles and painstakingly gleaning over legal cases from the first year of GDPR roll out.
Collectively, GDPR articles describe control- and data-path operations that a database system must support. We refer to this set as GDPR queries.
In contrast to the traditional CRUD queries, GDPR queries show a heavy skew towards metadata-based operations (i.e., queries conditioned on purpose, time-to-live, objections, user-id etc). Also, GDPR enforces restrictions on who could perform what operations under which conditions.
To allow controllers to insert a record containing personal data with its associated metadata (§ 24)
To allow customers to request erasure of a particular record (§ 17); to allow controllers to delete records corresponding to a completed purpose (§ 5.1b), to purge expired records (§ 5.1e), and to clean up all records of a particular customer.
To allow customers to rectify inaccuracies in personal data (§ 16)
To allow processors to access individual data items or those matching a given purpose (§ 28); to let customers extract all their data (§ 20); to allow processors to get data that do not object to specific usage (§ 21.3) or to automated decision-making (§ 22)
To allow customers to change their objections (§ 18.1) or alter previous consents (§ 7.3); to allow processors to register the use of given personal data for automated decision making (§ 22.3); to enable controllers to update access lists and third-party sharing information for groups of data (§ 13.3)
To allow customers to find out metadata associated with their data (§ 15); to assist regulators to perform user-specific investigations, and investigations into third-party sharing (§ 13.1)
To enable regulators to investigate system logs based on time ranges (§ 33, 34), and to establish compliance of security features (§ 24, 25)
Core Workloads & Metrics
Management and administration of personal data
Exercising GDPR rights
Processing of personal data
Investigation and enforcement of GDPR laws
We define four workloads that correspond to the four core entities of GDPR: controller, customer, processor and regulator. Each of these workloads is composed using the GDPR queries outlined previously. Then, we glean over legal cases and usage patterns from the real-world to determine the default proportion of queries within a given workload and the distribution of the records they act on. However, we have made these configurable to any changes.
The benchmark then characterizes a database system's GDPR compliance using three metrics: correctness against GDPR workloads, time taken to respond to GDPR queries, and storage space overhead.
Prepare your database system to be GDPR compliant. This typically requires enabling GDPR security features and implementing support for GDPR queries. For reference, check out how we introduced GDPR compliance in Redis and PostgreSQL.
Download and build GDPRbench. While GDPRbench has full support for its core workloads and queries, a client stub needs to be implemented for each new database systems. This is to translate generic GDPR queries into target specific APIs for your selected DB. Please consult Redis and PostgreSQL clients that we have already built.
Configure GDPRbench default parameters to reflect your organization's personal data and metadata attributes. Run the four core workloads at different scale levels to determine your readiness for rolling out database systems that support compliance.