Have you ever visited a website to check out something you want to buy only to be inundated with ads for that product in other websites?
I’ve been asked about this enough times that I thought it would be a good exercise to try and explain how this works.
This blog post is an explanation of how websites that seem totally unrelated to each other seem to know about what you are doing online.
It all starts with the Cookie
Before we describe what a cookie is (in the context of the web), let’s start by describing why they are needed.
The web is mostly a disconnected system. That’s an odd thing to say when the web is almost synonymous with being connected. What disconnected means in this context is that when you use your browser to open visit website, the browser connects to the server that hosts that website, gets the web page and disconnects. The web server only knows about you for that very brief period.
This is what allows one website to potentially serve millions of requests. It doesn’t need to hold information about you in memory for very long.
Well, if that’s the case how come we have websites that do “remember” us, where we can login and see data that is ours?
That’s where cookies come in. A cookie is information stored in your computer about the website you visited.
When you visit a website your browser sends a HTTP request to a web server that is listening at the address of that website. The web server sends back an HTTP response with the contents of the website (HTML) and some “headers”. A header is just a key value pair (for example: Server: TheServerName
).
There’s a header named Set-Cookie
which when present in an HTTP response will make your browser create a cookie for the website you are visiting. It’s actually a little bit more complex than this. The cookie is tied to the domain of the website, but for this discussion it’s ok if you think that a cookie belongs to one website.
Also, a cookie is just a text file saved on your computer.
Here’s an example of a Set-Cookie
header from a response to myawesomewebsite.com
:
Set-Cookie: the_cookie_name=the_cookie_value; path=/; expires=Tue, 06 Jun 2028 21:50:30 GMT; domain=.myawesomewebsite.com
After this response, every time you visit myawesomewebsite.com
until 06 June 2028 your browser will send a Cookie
header in the request like this:
Cookie: the_cookie_name=the_cookie_value
The easiest example to imagine of a website using this is a website that allows its users to pick a background color. In the response to the request to change the background color the web server will add a header like this: Set-Cookie: background_color=blue ...
.
Now every time you go to that website your browser will “tell” the website that you want the background to be blue, and the way it will do this is by including the Cookie
header in the request: Cookie: background_color=blue
.
The mechanism that allows you to log in to a website is very similar to this. When you visit a login page and enter the correct username and password the web server will send back a response with a Set-Cookie
header whose contents will allow the web server to find your data. The value of this cookie is usually encrypted so that only the server can create it, and also only the server will be able to read it. This way a user can’t just change the value and potentially sign in as another person.
Tracking users with cookies
I’m sure you’ve seen messages similar to this: “We use cookies to improve your experience on our site and to show you relevant advertising“.
The way that cookies can be used to “show you relevant advertising” hinges on how they are treated by the browser. Every time your browser makes a request to a website for which it has a cookie, it will send that cookie.
Also, when the request originates form a website that is different from the website the cookie is for, the browser will add a header named Referer
(yes, it’s misspelled, but that’s the way it is in the spec so we have to live with it). That header will contain the URL from the website from where the request originated.
This is simpler to understand with an example. Say you go to Wikipedia to read about the World Cup. While doing that you find something interesting in the references section that you want to explore. For example a page about the 2018 qualification that is hosted in another website that you have visited before (e.g. sofascore.com).
When you click on the link to sofascore.com your browser will add the Cookie
header to the request for sofascore.com
and it will also add a Referer
header with the url from Wikipedia, for example: Referer: https://en.wikipedia.org/wiki/2018_FIFA_World_Cup
.
How can this be exploited?
For simplicity imagine that the original website you’ve visited is named tracker. Now imagine tracker set a cookie for you with a value that is unique (will act as an identifier) and has a very long expiration date. Also, tracker has an agreement with loads of other websites, and those websites host a 1×1 pixel transparent image from tracker.
Each time you visit one of those websites your browser will make a request to tracker for the transparent image and send the cookie with your identifier and with the referer header. This way, not only does the tracker know who you are, it also knows which websites you are visiting.
The tracker can then sell your info to other websites that then can choose what they want to do with it. Mainly show you ads related to things they think you are interested in. It’s as simple as that.
Also, this technique is known as web beacon and it’s one of the most simple techniques. There are much more elaborate versions that will use a combination of features to “tag” you even if you delete your browser cookies.
In case you are wondering why a cookie is called a cookie maybe it’s because it’s like a fortune cookie? It has a message inside. I actually don’t know, but if you do please leave a comment.