European Union is on a route to simplify regulations. Some measures in the past decade have been criticised as overbroad. However, the rush risks going too far and incoherently. According to a leaked draft, the European Commission plans to modify the General Data Protection Regulation. This analysis relies on that leak and may change if the final text differs
Some of the changes are great, and may limit existing lack of clarity and simply useless things, or at least unfortunate. The changes are very far‑reaching. They include altering the very definition of personal data which has been the cornerstone of European data protection for decades. It appears the European Commission intends to weaken data protection within GDPR’s core architecture. In this post I make a technical, policy and regulatory assessment. The full leak is here.
Change of PERSONAL DATA definition.
Article 4(1). Personal data definition becomes entity‑relative
“Information relating to a natural person is not necessarily personal data for every other person or entity, merely because another entity can identify that natural person. Information shall not be personal for a given entity where that entity cannot identify the natural person to whom the information relates, taking into account the means reasonably likely to be used by that entity. Such information does not become personal for that entity merely because a subsequent recipient has means reasonably likely to be used to identify the natural person to whom the information relates”
This is to clarify that an e.g. small shop or a restaurant need not be over-paranoid about hypotheticals that they cannot foresee or have no impact on, for example events outside of their control. With “merely because another entity can identify that natural person” everyone agrees. Just because a person is identifiable to an omnipotent Entity, does not mean that all information is personal data. Just because a data breach of OTHER systems happen does not mean it impacts on separate data controllers. However, this is not the legal language in the Court of Justice of the European Union (CJEU) case law where it reads identifiability by reference to means which may likely reasonably be used in order to identify the data subject (as the case C‑582/14 shows, including “with the assistance of other persons “!) to be used by the controller or by any other person.
The entity‑relative formula reduces scope by ignoring third‑party means. That narrows protection (1, 2). In other words, controllers could “argue” (let’s not use the word “pretend” here) that they cannot do things that they in fact could do.
Example.
An adtech platform receives email hashes that data brokers can trivially match.
Today those hashes are personal data if identification is reasonably likely using means available to the controller or to others. Under the leak, the platform could declare them non-personal because its own means are insufficient, even though matching is easy for partners or even by itself (technically, with a reasonable effort). Even with unique salts, they remain personal data if the platform can use a lookup table or resubmit raw emails it holds elsewhere to generate the same hash and match. Under the leaked Art. 4 wording, the platform can argue the same hashes are not personal for it if it cannot identify using its own means and if it ignores partners’ mapping capability
Currently, identifiability under GDPR/CJEU considers third-party means. The leak would remove that by making identifiability entity-relative. It's a big difference.
Health data narrowed to direct revelation (Article 4 point 15)
The gravity of previous changes become stark when we realize that the proposed change would introduce this: ‘data concerning health’ means personal data related to the physical or mental health of a natural person, including the provision of health care services, which directly reveal information specifically about his or her health status.’
Let’s contrast it with CURRENT definition: “means personal data related to the physical or mental health of a natural person, including the provision of health care services, which reveal information about his or her health status”.
The difference IS significant. The leaked EU proposal rewrites health data as information that directly and specifically reveals health status. Controllers might then argue that low-granularity items no longer meet the new threshold because they do not specifically reveal a status; even though CJEU held that even low-detail context qualifies: “data concerning health” has a broad meaning, not necessarily direct and specific. It may be worse.
Imagine: repeated insulin purchases may imply a condition. Currently it could be treated as special category processing (if inference is strong). It would risk being explicitly excluded unless the record itself directly and specifically reveals the status.
For example, mentions of a hearing impairment, audiology appointments, use of hearing aids or a captioning interpreter, or medical leave linked to hearing is special-category data today. Any organisation handling it needs a valid ground. Attempts to downgrade such data by invoking the “directly reveals” wording may fail.
This would erode protection for proxy signals and inferences, in extreme cases someone could argue that it legalises such practices. Naturally, that would definitely make room for new business opportunities ...
Special categories narrowed to direct (Article 9 paragraph 1)
Proposed: “Processing of personal data that directly reveals in relation to a specific data subject racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, his or her health status (data concerning health) or sex life or sexual orientation and the processing of genetic data or of biometric data for the purpose of uniquely identifying a natural person shall be prohibited.’
This is a change from ‘revealing’ to ‘directly reveals’, and concerns are similar to the previous one. But here the argument is simpler because the top EU court already ruled that inferences (indirect reveal) can make data sensitive even if the original field does not directly reveal the trait. Case kind of closed based on the current EU case law: “the publications … of personal data that are liable to disclose indirectly the sexual orientation of a natural person constitutes processing of special categories of personal data”. The leaked wording could be read to make such processing lawful...
Training AI with special category data is fine, IF…
“For processing [AI purposes] appropriate organisational and technical measures shall be implemented to avoid to the greatest possible extent the collection and otherwise processing of special categories of personal data. Where despite the implementation of such measures the controller identifies special categories of personal data in the datasets used for training testing or validation or in the AI system or AI model the controller shall remove such data. If removal requires disproportionate effort the controller shall effectively protect such data from being used to produce outputs from being disclosed or otherwise made available to third parties.”
In effect, this would create a certain tolerance for residual special-category data in training sets, subject to avoidance, removal, or isolation attempts where removal is a disproportionate effort. The goal here is to allow this explicitly.
Biometric verification under sole control (Article 9 paragraph 2 point l)
“(l) processing of biometric data is necessary for the purpose of confirming the identity of a data subject verification where the biometric data or the means needed for the verification is under the sole control of the data subject.”
On‑device processing would be explicitly allowed. This is fine (and a good change).
Pseudonymisation is going to be useful (Article 41a), finally.
“Article 41a [Placeholder for mechanism to accompany the state of the art advancements for pseudonymisation technologies.]”
This is promising (great!). Currently pseudonymisation in the GDPR is rather useless for the controller, as there’s no direct, clear incentive to do so. This point would now allow the development and deployment of innovative privacy-preserving technologies, which are already mature. However, the details must be clarified. But that’s the best addition in this GDPR modification. It can allow processing in ways that protect user data and privacy. It would also UNLOCK some creative uses of data done the right way.
AI processing done via legitimate interests
Careful here! With a literal read of the EU Commission proposal (Article 88c), AI training may rely on legitimate interests. So no consent or awareness of the person in question. I think it could be made to work, assuming that it’s really respected and enforced properly… But for special categories of data as well? That would be quite radical. Under current GDPR, legitimate interests cannot justify processing special-category data
Summary
Brussels says it’s simplifying the GDPR. The leaked draft looks great in some places. In others, it does the tidying. Mistakes should be avoided.
Expect more from me in this area.
Comments, queries, or maybe contract offers? ;-) Contact me at me@lukaszolejnik.com. Let's talk!