AI, LLMs, GDPR complaint, and human dignity

Artificial Intelligence (AI) processing relies on data acquisition and heavy data processing. Large Language Models (LLMs) have powerful content generation capabilities.

First, a few explanatory sentences to place all this in a context. I'm not an activist, I'm not an NGO. I'm a researcher, consultant, and expert. I have tracked the development of GDPR regulation since the early 2010s. I never misused the GDPR mechanism nor the complaint system. In fact, this is my first complaint. I believe that the current circumstances warrant taking action and raising some concerns.

In this post, I describe my GDPR complaint concerning OpenAI’s ChatGPT. Technical design questions are motivated after the use of the system, and the related communication exchange when trying to understand what is happening. I know this due to my knowledge and experience. This case delves into systemic issues within AI, particularly concerning the (1) processing of input and output data generated by language model (LLM)-type tools and (2) the provision of information related to such processing. Remarkably, these concerns align with the foundational rules outlined in the EU AI Act. The EU AI Act, at its core, emphasizes the importance of elucidating situations to users and underlines design considerations, including associated risks.

Designing, deploying, and maintaining such systems necessitates care to take account of the risks. While it’s a matter of data protection, it is valuable to think of it in a broader sense - human rights, and human dignity. The EU GDPR concerns aspects of data protection. It's about respect for fundamental rights and freedoms. As motivated by values.

Data protection aspects critical to the case are principles of lawfulness, fairness, transparency, and data protection (privacy) by design and by default (technology design). The case has the potential to hold particular significance for the future, potentially emerging as one of the most influential GDPR cases related to AI. Especially in addressing the nuances of Article 25—pertaining to data protection by design. This legal provision emphasizes the paramount importance of integrating data protection considerations into the very fabric of technological design.

What

By now, everybody should be familiar with ChatGPT. I spotted that it was generating spurious data about me: incorrect, fake even. This certainly cannot be described as a “hallucination”, the name some chose to use. In this particular instance, the generated false data, including fictitious works erroneously attributed to me, raises concerns. The implications extend beyond mere inaccuracies, as users may now implicitly placing trust in the information provided by ChatGPT or others of the kind. Noteworthy examples involve the fabrication of works I did not author, and profiling activities that seemingly detect demographic data through unclear measures.

As a concerned but honest citizen, my initial course of action was a natural one - contacting OpenAI. I asked to exercise my rights due to GDPR (articles 12, 14, 15), seeking clarity. The corresponce continued over the course of three months. However, when communicating, it seemed as if it was for some reason impossible to explain to me what was happening, how the system worked, and how the data had been obtained or used. Ultimately, a comprehensible response was never given. Specifically, what was the data held about me (regardless of the form), and how was it acquired, and processed? In other words, OpenAI did not comply with those requests. At some point, there was even an attempt to “comply” by… blocking the generation of data about me. That was an obvious misunderstanding. Throughout our communication, a sense of ambiguity prevailed. Responses seemed, at times, misleading or contradictory. The ambiguity left me grappling with how to interpret these exchanges and draw meaningful conclusions.

Summary of issues as quoted from the complaint

"1) violation of the fundamental principle of data processing in Article 5(1)(a) GDPR, i.e. the principle of processing data lawfully, fairly, and transparently,

2) improper execution of the data subject's rights, including the right of access to data and the right to rectification,

3) failure to ensure a sufficient level of security of processed personal data and violation of the principle of data protection by design (privacy by design). It appears that OpenAI systemically ignores the provisions of the GDPR regarding the processing of data for the purposes of training models within Chat-GPT”.

OpenAI did not provide me with information. The concern lies not only in the data accessed or obtained by OpenAI's vendors but also in the broader spectrum of users using ChatGPT. The lack of transparency regarding who, among the various users, is obtaining content created based on my personal data raises substantial concerns. This leaves a crucial question unanswered — who holds access to my information and under what circumstances? One is vendors offering services to OpenAI, but what about, naturally, all the individuals who query ChatGPT about me?

The complaint has been prepared by GP Partners, one of the best law boutiques in Europe, with input from me directly. Following initial discussion, and the issue’s likely nature of critical public significance and value, GP+P kindly offered to handle this case on a pro bono basis, for which European citizens should be grateful.

GP Partners is a law firm with primary areas including technology, data protection, dispute resolution, mergers & acquisitions, financial regulation, public procurement, and employment, including, anti-money laundering, competition, compliance, corporate law, intellectual property, unfair competition, media. In 2019, it was listed as FT Innovative Lawyers Europe Awards 2019. M. Gawronski authored a book reference about GDPR.

Not lodged in Ireland

The complaint was filled with a DPA in August. Despite OpenAI's establishment in Ireland within the European Union, it's noteworthy that the case has not been lodged with the Irish Data Protection Commissioner (DPC). This is due to (1) timing and (1) the French méthode (“the Irish-based entity "did not have decision-making powers" concerning the purposes and means of cross-border data processing”) as explained in the complaint.

Since it is about the design, many people - not just me - may be impacted.

The GDPR principle of Data Protection by Design may appear to be completely ignored. That is the very center of the GDPR. One example. Even if it would be understandable that a data remover/rectifier (machine unlearning) from the trained model is unavailable now, are there any ongoing R&D works on such capabilities, including in the sector?

Data processing is about input and output. Naturally, data protection, privacy and human dignity due to AI processing is not the only concern or action taken. I’m not living under a rock. I am aware of other proceedings concerning AI/LLM processing, considering other concerns, and standing, with respect to law, such as copyright For example, consider cases of New York Times or Getty Images. There are also concerns of authors including nonfiction book authors (having authored papers, articles, and books, I can relate).

The case for human dignity

The issue at stake is the standing of established rights, including those of humans, and Generative AI, once again, about: input (data collection) and output (data generation).

Finally

The case described concerns data protection principles, challenged by AI development. For starters, I would be delighted to finally obtain proper information about data processing. You, on the other hand, could know that the provision of such data is possible at all. Well, is it?

The full complaint is here, and the supplement here.

A word of advice to data controllers or processors - when responding to data subject requests, the responses should be factual, not merely to give an impression of comply with GDPR. Offering “whatever” in response is not necessarily properly checking the box. As for the design itself, with Data Protection by Design and by default, it should be applied as its name suggests.