Welcome to the privacy analysis of Progressive Web Applications. With new features in steady supply, the web is changing in exciting ways. One of the more interesting trends is the concept of Progressive Web Applications (PWA). PWAs use modern and powerful web features to further blur the boundaries between web and mobile apps. In practical terms they let users to “install” web applications to the smartphone screen.

The concept of PWA puts security at the center by requiring to be served only via encrypted HTTPS connections and that’s great. Privacy was considered in case of the building blocks. Today I explain how the technology evolution, together with browser UI considerations, may give rise to privacy issues. Designing with privacy in mind is a vertical-horizontal exercise and is not always easy (the 2018 Chrome case is a well-known example: 1, 2). That’s why privacy research & development is so interesting. Whether constructive or “offensive”.

Tracking Progressive Web Application

A neat trick using this new format, together with how the web and browser work, and how this stuff is implemented in major browsers (Chrome, Firefox) allows for persistent user tracking, in a non-transparent manner à la evercookie, including the respawning-cookies-after-removal bit. I describe how it’s done, and show an example app conforming to the Persistent Web Apprehension approach.

It works on Android (Chrome or Firefox) and iOS (partly).

Web App Manifest start_url

Web Application Manifest is a JSON structure that describes the web application. It allows the browser to generate a “web apk”, i.e. install the webapp on a smartphone. The manifest describes things like the background color, the screen orientation, the icons, or the starting page. This starting page is defined in the manifest.json’s “start_url” field, which points to the page to be opened when the user launches the installed application. It’s (at least) this thing that allows tracking. Many sites use start_url with special parameters (more about this later).

Tracking scheme can be done by simply defining a start_url as page such as “URL?tracking_id=” (X’s being a unique, random-generated tracking id). Whenever the user opens the app, a request to URL having a unique and persistent ID identifying the user is made.

Another side consequence is that removing cookies do not perform its function. Malicious PWA application can respawn cookies. Let’s see how.

Persistent Web Apprehension

Persistent Web Apprehension demonstrates offensive use of Progressive Web Application (Web App Manifest) to facilitate tracking and cookie respawning (i.e. like evercookie), the setting back of cookies after their removal, thanks to other tracking techniques used at the same time. This is not particularly transparent to users - it is almost certain users will be unaware they might be tracked. What’s worse, they might be unaware they are tracked following the removal of tracking identifiers. Taken together, it is important, both on the hard technical layer, the privacy awareness (transparency) layer.

How to test it

Go to the PWA site and follow the instructions (the demo has it all), alternatively first see below how to install:

On Chrome:

On Firefox:

The site is here: Persistent Web Apprehension.

I don’t want to test; just tell me what it does

The user installs a PWA. A unique identifier is generated, the user receives a codename. The user opens the PWA app. In future, the user clears cookies of the visited page, but still uses the PWA that sets the cookies again to the previous ones (hint: or new/different ones, linked to the old ones; functionally equivalent but more difficult to detect).

The demonstration uses some more advanced ways of recreation, for example if the user removes the app but does not remove the cookie, it is possible to re-instantiate the original identifier. I am sure there are many more options for improvement. It’s a proof of concept demonstrating an issue.

Cookie respawn

The demonstration shows also that PWAs can not only be used in stand-alone tracking, but also aid in cookie recreation (if the user decides to remove cookies). This kind of breaks the current privacy user interface, as users are unable to effectively clear identifiers.

Cookie respawning bit might not be unexpected as PWAs are webpages, and webpages can set cookies. But I wanted to bring attention to this issue because the further decoupling and blurring the web/app sphere might not exactly be clear to ordinary users.

Furthermore and on the other hand, cookie respawning is not affecting iPhone/iOS, while it does affect Chrome/Firefox on Android for example (I did not check Windows Phone). It is unclear why iOS is not affected, whether it’s because of a deliberate design decision, thanks to accident/luck, or merely due to the slower pace to ship the functionality. But iOS is unaffected by cookie respawn because Safari browsing and PWAs added to Home screen appear not to belong to same application space (specifically, the PWAs are rendered by an app in a separate process, not Safari, which would explain why cookies are not shared; that said the straight-forward ID is still a problem).

Single-site tracking?

Such tracking works on a particular website or app. However, PWAs can link to others, so a more or less formal system of sharing IDs among the websites may be imagined. There is already a well-established concept of cookie matching, used by advertising networks.

What’s so special about it?

Web Application Manifest opens a precedent. Users are able to have many dedicated “starting pages” (but website-defined and possibly uncontrolled by users) - webpages/apps added to Home screen. Should they contain tracking bits, it would happen unbeknownst to users, with unclear controls.

Why?

It’s doubtful that the fact that a tracking identifier is in use is apparent to the user. It is unclear if expecting users to detect such creative abuses (throwing it on the users is what the W3C specification seem to foresee) is realistic. I’m skeptical. Even if so - current browsers do not easily allow for doing this, anyway.

Anyone using it? Web privacy measurement

Trackers or abusers are often the early adopters of new technologies. I made a web privacy measurement for the top 10,000 webpages to check if anyone is using this.

  • 1672 pages include a manifest.json
  • 828 use a dedicated start_url
  • 274 use parameters
  • None appear to use randomly generated identifiers

While I did see apparent unique identifiers (like, for example, 51606102_9527_7259_7770), they did not appear to be randomly generated for each new users. This modest test allows for a cautious observation that this technically possible tracking appear not to be in use as of today.

Regulatory angle

Under the California Consumer Privacy Act of 2018, such stuff is “Unique personal identifiers” and warrant protection. The bill is not exactly actionable in the case of respawning.

Under GDPR and ePrivacy Directive, the identifiers would constitute personal data, and be similar to fingerprints (in the PWA case) and/or cookies (when respawn happens). It would be a legal identifier to use (subject to many provisions). Cookie respawning would be quite problematic, though (for controllers, and UI providers, so browsers). The update to ePrivacy Regulation was supposed to be even more clear about this, but the bill is now stuck and apparently chances for proceeding it in foreseeable future are relatively small.

Recommendations

Since the technique is not used to track the state of user session yet, it’s possible to think what to do about it. The most straight-forward drop-in approach is removing the application when the user chooses to “clear site data”.

Forbidding the use of parameters in requests (URL?x=....) no longer seems possible. Parameters are already commonly used (in over 25% cases, based on the measurement above). Throwing the problem on users is equally not a solution to the problem (it’s an artificial privacy consideration).

Heuristic abuse detection might seem simple but not clear if web browsers would want to deploy it.

Updated 9/4/19

Tracked by Firefox Bug 1542898.

Summary

Progressive Web Apps are great, but so far its privacy footprint was not spoken very often and is little understood. Web Application Manifest allows for the creation of persistent identifiers, and facilitate cookie respawning. This in itself it might not be surprising, but holistically seen, including the browser user interface, it might become a problem.

But in the end, privacy engineering and this privacy analysis of progressive web applications in particular, was fun.

Did you like the analysis? Do you have any feedback Want help? Feel free to reach out: me@lukaszolejnik.com. I sometimes freelance.