Reidentification ban is not a solution

When governments acknowledged that computer hacking is increasingly made by criminals, hacking techniques employed by criminals were simply banned. This has solved the problem; since then we‘re not hearing of any data theft, fraud and so on. Right.

In the European Union, General Data Protection Regulation will enter into force on May 28th. Member State governments are struggling with GDPR implementation into their local law. The UK has recently unveiled its initiative for a Data Protection Bill. It contains a number of interesting points, but I will only focus on one.

UK’s GDPR implementation may have visionary traits; in that it goes beyond merely implementing the GDPR as just a legislation. UK will introduce new criminal offences, among them reidentification - reversing methods meant to protect private data by employing special techniques known as deidentification, anonymisation or pseudonymisation.

Indeed, Data Protection Bill project contains this felony. Have a look here.

Very often such techniques are complex, but for the sake of clarity let’s take my e-mail: lukasz.w3c@gmail.com. In many contexts, mail addresses may be personal data, for example when used as identifiers. So a not-very-wise technologist could come up with a clever idea of protecting this data using a one-way hash function, such as md5.
Now the identifier would look as follows: ff6d20403ff94e175800b970cfdf3f83.
Looks safe now, doesn‘t it? Except that it‘s not and breaking this scheme in this case would be very easy. Some schemes or systems are much more complex (or just different), but those can often be defeated, too. Researchers are regularly demonstrating the shortcomings of anonymisation. Some data simply cannot be easily anonymised - in that, it’s a “failure” of anonymisation, which is an interesting phenomenon.

Now the simple solution pursued by the UK is just banning these kind of practices. They follow the Australian lead, where only local laws apply (which have precedence over, the laws of mathemathics).

There are several issues with banning of reidentification. First, it won’t work. Second, it will decrease security and privacy.

Effectiveness

It’s rather not effective to enforce reidentification ban practice! First, if a malicious actor, say a company, would employ it, how do you detect that it’s being done in the first place? It’s not that reidentification must be done overtly. Its nature is fundamentally different than (often overt) security intrusions. It’s passive in nature, so bans won’t work. I won’t deny that this is a real problem and it will only grow. It’s good it is identified..

At least, since now we know it’s difficult to find real abusers. But hey, some people are doing it in public, so why not go after them?

Stifling research

Security and privacy researchers regularly demonstrate the risks of reidentification. You might have heard about reidentifying Netflix users from public datasets. Or reidentifying people based on information from genetic databases. Or identifying people from Australian census data.

This kind of work is public, overt, and made for the good of society. So researchers (such as myself) with pure intentions may find themselves at a natural risk of becoming the victim of reidentification ban. Such a law, if not enacted with utmost care, would basically null large chunks of privacy research, and as an effect make us all less safe. Can you imagine anyone being facing an unlimited fine (as in UK‘s proposal) wanting to take the risk?

UK’s *ICO will furthermore find itself in a possibly inconvenient position where they will need to judge which research is or isn’t appropriate.

Reidentification ban may weaken security and privacy

The problem is that in practice, many deidentification or anonymisation schemes are weak or simply they don’t work at all. Penetration testers, privacy engineers, researchers and designers are working on breaking and building systems on a daily basis, if reidentification - which for example requires breaking the used techniques - suddenly becomes banned, not only we will end with weaker cybersecurity and privacy systems, but with another risk: companies will simply be less motivated in investing in their security and privacy designs. Why bother if you can simply enter use legal means?

Backdoors vs Reidentification

Those not deep in technology or research are bound to oversimplify things. Seeing encryption as an issue, laws requiring its reversal may came to mind. On the other hand, seeing reidentification as an issue, the ideas of banning the reversal of techniques - even if lousy and careless - surface. I make this analogy as I suspect it can actually exist in some circles, however wrong it is.

Summary

Reidentification ban is not an ultimate solution. It won’t magically address abuses, but may make legitimate research made in good faith significantly more risky. As a result, the core issue will persist, but the overall security and privacy level may be negatively affected.

Let’s hope this regulation will go in a smart direction avoiding oversimplification. And that technological aspects of identification, deidentification and anonymisation at concerned organisations will be treated with the best privacy engineering standards available.

From the privacy engineer’s point of view not all data can be anonymized simply. It depends on the data, on the context (e.g. is the data made public?) and other factors. This needs to be carefully assessed. Privacy engineering can help in making systems more safe against reidentification and other challenges.