On proposals affecting the privacy of web architecture
Interesting proposals of web standards amending the way some aspects of web architecture work emerged from Apple and Google. This marks a pretty unprecedented competition over web architecture. The grand battleground is web standardization. As such it will happen in the open and involve the larger community.
Web advertisements are known to use of privacy-invasive tracking techniques, increasingly loathed by users. As a result users adopt content (ads) blockers. Some browsers also (Firefox, Safari) started to block some tracking methods by default.
To some stakeholders, it is viewed as impacting the business model of the web. One of the particularly affected components is the ability to track ads conversions. Conversion happens when following an ad click the user later performs an action on the advertised site (e.g. buys something, registers at the site, etc). Tracking conversions helps in seeing for example how well the ad campaign is performing.
In an ecosystem increasingly hostile to advertisement and tracking, this year key web players propose ways to facilitate the continued use of ads conversions in “privacy-proofed”, in their view.
In this post I’m looking and comparing the two latest competing proposals: from WebKit and Chrome, sparing some specific details. I am also mentioning of some preliminary privacy observations (if you prefer to skip the technical details, read only sections marked as "Observations" and the summary).
Apple (WebKit)
The core premise behind the Apple proposal is:
Online ads and measurement of their effectiveness do not require Site A, where you clicked an ad, to learn that you purchased something on Site B. The only data needed for measurement is that someone who clicked an ad on Site A made a purchase on Site B. The combined data of an ad click and a conversion should not be attributable to a single user at web scale
To achieve this, WebKit proposed and Ad Click Attribution. Among the design goals is to “avoid placing trust in any of the parties involved — the ad network, the merchant, or any other intermediaries”. In simple terms, it works as follows.
Let’s say the user is browsing a site site.zzz ( that contains ads). The advertisement contains a link in the following format:
<a adCampaignId=YYY adDestination=shop.zzz>
.
YYY is an ad campaign id (can be of the value between 0 and 63). Where shop.zzz is the destination place, such as an advertised product at an e-commerce store. Once the user clicks on the link the browser will store the following: (site.zzz, shop.zzz, YYY). This tracks the user that clicks on the ad. The user clicks on the link ad and so is transferred to the shop.zzz.
Then let’s say that the user is performing an action for which conversions are tracked, for example, product purchase. The web site conforming to the protocol makes the browser perform a request to:
https://site.zzz/.well-known/ad-click-attribution/AAA
.
Here, AAA is the attribution id, denoting the conversion type - again a number (values between 0 and 63) meant to encode what the user did on shop.zzz. The request format informs the browser that the conversion event happened. It matches the ad campaign id with the attribution id. These are sent to the reporting website of site.zzz:
https://site.zzz/.well-known/ad-click-attribution/YYY/AAA
.
The request is made between 24 to 48 hours after the conversion happens (makes correlating timing more difficult), informing that a campaign YYY resulted in “action type” AAA.
This means that max 64 parallel campaigns can be tracked per website containing ads. This limitation is meant to ban the use of the campaign id as a user identifier. Same for the conversion ID.
This system has the following properties:
- the total number of exchanged bits of AAA (6 bits) and Y (same) is 12, which restricts the use of this API for user identifier like cookies.
- It is ephemeral, the 12-bit identifier disappears after 7 days (if not reported).
makes user profiling difficult due to the reporting delay. - If the user wants so, content blockers may easily block it.
Observations
The solution has some potentially less evident properties:
- Safari users will be opted in by default. This is so to encourage the use of this API by websites, though today it may constitute an additional identifier in presence of others (assuming Safari anti-tracking measures allows them).
- It does store identifiers which are mediated by the web browser, as such it should hopefully be possible for users to clear them manually. Their use may be subject to the same constraints as cookies.
- The format allows for encoding 12 bits as well as the effective top-level domain name (+1), like example.com; offering potentially 4096 unique identifiers. Of course, if someone wants to try gaming the system they could try using different domain names, like example.org, example.zzz, etc., expanding the potential number of usable bits. You could imagine, for example, to direct different domains at different users based on the IP address of the user. Whether it would be reasonable or realistic to use this scheme is another story.
- It further grounds the web browser the ultimate web gatekeeper in the web economy, not all stakeholders always like that, the position of web browser is made more powerful (technically web browsers are in a powerful position anyway)
Furthermore, the specification mentions that “the user agent may use a central clearinghouse to further anonymize ad click attribution requests, should a trustworthy clearinghouse exist”. This is not expanded but it probably means a trusted third party that would receive conversion reports, anonymise them, and forward further.
Google proposal
Google announced a different proposal to “fix” the ads conversion problem that emerged due to ad blocking, the Conversion Measurement API. In some respects, it works similarly to Apple proposals but there are important differences.
Details
The way that advertisement links are defined is as follows:
<a addestination=shop.zzz impressiondata=0x12345678 impressionexpiry=[long] reportingdomain=advertiser.zzz>
addestination being the is the destination of the click, impressiondata a hexadecimal data string (size 64 bits), impressionexpiry (in seconds, 7 day default) signifying when the “impression” (event of the user clicking on the ad) should be deleted from browser memory, reportingdomain - (optional) desired website where the conversion report for an impression might go (the default is shop.zzz in our case could be overridden with advertiser.zzz, for example).
When the user clicks on the link, the browser memorizes the following: (impressiondata, addestination, reportingdomain, impressionexpiry). The conversion registration is pretty similar to the Apple proposal:
https://<reportingdomain>/.well-known/register-conversion[?conversion-metadata=<metadata>]
The stored data becomes (reporting domain, addestination domain (i.e. site.zzz), impression data, decoded conversion-metadata, last-clicked attribute) (last-clicked signifies that this resource was the last clicked one).
This data is then reported (two reporting windows, 2 days from the impression time, or 7 days from the impression time, or referring to the impressionexpiry as defined by the embedding ad) by requesting:
https://reportingdomain/.well-known/register-conversion?impression-data=&conversion-data=&last-clicked=
The conversion metadata is proposed to be of 3-bit size. With campaign metadata of 64 bits, this gives 67 bits. In the case of Apple that was 12 bits, which was less identifying (at the expense of being less useful for advertisers).
WebKit vs Google?
Some rough and informal comparison:
- Apple’s proposal only allows the ad in the top website frame, Google’s allows to use them in embedded iframes, making the proposal more advanced (i.e. ads scripts placed dynamically?) but also more complex.
- In Apple case, the reported conversion goes to the site where the user originally clicked on this ad. In Google case, it seems the report may go to different places - for example to the advertiser (i.e. advertiser.zzz) site directly
- The amount of information in both schemes is limited by the size of identifiers, which define the "information budget" that can be utilised by advertisers. Google wants the data to be bigger to possibly allow learning that a click on a particular ad led to a conversion. Of course, Apple might be less interested in this kind of information than Google is.
Summary
Both proposals attempt to find the balance between user privacy and utility in web advertisement. While the proposals are laid out differently, the information about who and how decided on these trade-offs is not necessarily easy to decipher. One thing we should always hope to avoid is having different browsers conforming to different standards. Both Safari (on mobile) and Chrome have significant user bases, technically they could go in their ways unilaterally.
More generally, we are seeing further competition over the future of web architecture, with privacy at the center stage. While it only involves the US companies, fortunately, this debate will happen in the open and hopefully, involve also others potentially interested. The place where it will all happen will be web standardization.
Ps. Did you like my work? Have comments or are you perhaps interested in another type of analysis? me@lukaszolejnik.com.