Enhancing user transparency with Adstxt

Real-Time Bidding is a technology enabling the targeting of content to mobile and web users. Real-Time Bidding has numerous problems.

  • Security, including malvertising (abusing ads infrastructure to deliver malware); affecting hundreds of millions of user visits; delivering malware.
  • Privacy, as it is a mass market for personal data, with vague controls (1, 2); it is still unclear how to tackle the problem.
  • Transparency, as users do not know when and how their data are used, for example on which site.
  • Deception engineering, as the technology is being used in political PR and disinformation (I tried to draw attention to this problem in 2016 here).

Some of the issues plaguing the technology are not just affecting bystanders like unsuspecting web users, but also publishers, advertisers, and RTB providers. Fraud is a problem - publishers can be played via a simple scam scheme by abusers purporting to sell ads on sites, say of CNN, while in reality offering space on vague and less profitable sites. RTB auction controllers find it problematic to uncover fake ad agencies (when you realize that these days at least someone is looking for problems then you understand how slow the evolution is taking). So, directly impacts on the users of RTB ecosystem; affecting trust, and money. No wonder that these problems are being treated seriously.

Ads.txt standard

Ads.txt is a standard meant to help. Publishers (i.e. websites) define which parties are allowed to offer ad space on their sites. It’s a simple form of a public registry (a text file), that can be inspected prior to engaging in the process. This is meant to help to enhance trust in the advertisement ecosystem. I realized there might be a potential for enhancing user transparency. But to validate it, I had to have some data behind the actual use of the standard.

Measurement

Since summer 2017, I was studying the adoption and evolution of Ads.txt, collecting data on the adoption and mechanics of changes; involving crawls of 100,000 sites. I previously wrote about some of my ideas and past measurements (here, and [here(/real-time-bidding-transparency-ads-txt-update/. I continued working on the topic, and as such I probably amassed the largest longitudinal dataset of the kind. The latest and last measurement is performed in December 2018.

  • I found 941 hosts involved in the system. Much much more than during my RTB studies from 2013 and 2014
  • Much more sites using ads.txt as of the end of 2018 than in the summer of 2017: over 18,000. There are data indicating that on the internet scale, it is even bigger more than that.

Since my early crawls, the adoption increased over 170 times. Ads.txt is now present on over 18% of 100k most popular websites. Thanks to this self-disclosure, we now have precise data behind the prevalence of Real-Time Bidding on the web.

There are now so many users that the flow graph is unintelligible:

Consider comparing to the [previous](here post.

It is now evident that the standard is here to stay; it’s stable (though the use on the web is evolving); the adoption is substantial. It is also becoming required by major RTB providers.

Therefore, why not benefit from it too?

RTB transparency with Adstxt

First, the data is a good resource of domain names that are used in ads ecosystems; some privacy tools simply need hostnames.

But the transparency angle could be tools showing when a website is enrolled in the RTB ecosystem. A while ago I have made such a proof of concept Chrome extension. Find it here. The extension is very simple and shows when a site has a valid ads.txt file, interpreting its contents and informing the user.

I am sure there is are even more creative uses out there. My primary observation is that even though the information was not directed for users, it can be repurposed to serve a good cause.

Finally, find the more structured research paper with more details here: Enhancing user transparency in online ads ecosystem with site self-disclosures

Summary on privacy, transparency, and disinfo

The standard works provides some slight potential to increase user transparency. However, we should expect more changes to happen in order to further enhance the security, privacy and transparency posture of RTB. In 2013 and later when doing privacy research, I did see the potential of the technologies in disinformation. Privacy and transparency of certain technologies are very closely linked to disinformation indeed, and today the gravity of the problem is bigger.

But in general I find it difficult to understand why in 2019 we still can't see the lists of bidders involved in bidding for the particular ad on a website we browse. It would not only significantly increase user transparency, but seriously help in fighting disinformation. With Adstxt we can seen how more contextual information can be used in research, measurements, and also to enhance user transparency. Doing it is technically simple.

Did you like the post? Feel free to reach out: me@lukaszolejnik.com

Ps. I sometimes do public speaking. If you’d be interested in me speaking on this or another topic, feel free to reach out too