Privacy analysis of... Search in the page?
I never thought it would come to this but this is a privacy analysis is of the “ctrl-f”. You know, the find-in-page functionality: when the user wants to find an occurrence of the word “cat”, control-f shortcut or the manual menu selection opens the find-in-page prompt. So exciting?
Perhaps not. But this small overview is motivated by Google’s proposal to standardise a feature for websites to see when the user used the “find in page” functionality (soon in Chrome browser). The proposal is to introduce a beforematch web browser event that can be hooked to with JavaScript. This way it would be possible for the website to detect the rough location of the searched-for-content. Still, no exact data is shared. But this is risky as it plays with the well known (?) web browser user interface (some browsers learned that such data is treated as sensitive). This is the core risk.
Today, websites do not have a standard way of knowing what the user is searching for. The typed words are not signaled to the website. While there are some non-standard quirks (e.g. 1, 2, 3) that allow the discovery of the actual content the user was looking for or detect the use of “ctrl-f” keycode combination, web browsers do not treat such actions as abusive behavior. Some developers want such functionality. Some websites today introduce custom search functionality prompt (for example, GitHub, it makes sense there!). It does not happen everywhere, though.
Introducing a standard web browser mechanism to do so would make it simpler to do so. Would it also signal that tampering with the user interface in this way “is fine”?
Protection of the user interface
For security/privacy reasons web browsers tightly protect the user interface. It's a sensitive thing. Web browsers make it difficult (impossible) for websites to modify the standard web browser user interface, which is assumed as something the user needs to be sure it is trusted. After all, users need to have trust towards this interface - often providing sensitive data via the web browsers. It’s for this reason for example why web pages today are not allowed to modify the permission prompts (“this site wants to know your location data… do you allow”).
More generally, what may be the privacy risks of websites snooping on find-in-the-page content?
Profiling is a risk when websites would detect what the user is intrinsically interested in - to a degree to manually type the keyword.
An event like beforematch would make it possible to reliably learn the exact timing and patterns of user search (even if not precisely its contents), based much less on inference.
Data leaks are the risk assuming that the user typed sensitive keywords (like e.g. personal data?) and the website would be in a position to detect it, for example by inference.
For beforematch feature, the most important precaution is that it does not go beyond today’s status quo: it does not provide the raw text of what the user is searching for. But the specification rightly notes:
Note that this event exposes more information to the page than would otherwise be available. In particular, the page can know which section of text was found using find-in-page, fragment navigation, and scroll-to-text navigation.
Of course, then it gets complicated. Interference with the established web browser user interface is important also because of the potential upcoming (?) modification of the web architecture.
Summary
Over the previous years we learned a lot about designing features with privacy - and how hard it may be. It does not stop being fascinating.
Did you like this writeup? Do you have a comment or remark? Feel free to get in touch (me@lukaszolejnik.com)