Using HTTP Referer Header to Prevent Hotlinking

Posted on August 11, 2021

Wikipedia on HTTP referer:

Some proxy and firewall software will also filter out referrer information, to avoid leaking the location of non-public websites. This can, in turn, cause problems: some web servers block parts of their website to web browsers that do not send the right referrer information, in an attempt to prevent deep linking or unauthorised use of images (bandwidth theft).

Bandwidth theft refers to the practice of embedding, on one’s own site, images or other resources hosted by a third party, without the third party’s permission. This is also known as hotlinking. For example:

<p>Welcome to my site!</p>
<p>Enjoy this helpful diagram!</p>
<img src="http://competitor-site.com/helpful-diagram.png" />
<p>Contact me if you found this diagram helpful.</p>

This would lead the web browser to request the helpful diagram from the competitor’s server, but the diagram would be displayed on my site. This is a problem because my competitor likely does not want to incur costs to help me reach customers. Furthermore, visitors to my site would be likely to assume that I had produced, or at least licensed, the diagram. Unfortunately for them, my competitor would not be able to leverage copyright law to force me to remove the image reference from my page, as I am only providing a reference to the copyrighted work, not a copy of the work itself.

Now, is the referer header really a good way to prevent hotlinking? It might prevent the most common cases of hotlinking, where users are accessing another site from a standard web browser. However, at least in its simplest implementation, where we just check that the referer is our own domain, it is not effective at preventing hotlinking from other applications, which might manually set the referer header to bypass the referer header check.

Consider the issue from a more fundamental security perspective. Essentially, preventing hotlinking is an access control problem. We would like to allow access to certain resources only for the purpose of displaying them on another page that we own. As long as we are working with standard web technologies, we cannot accomplish this objective perfectly, as we cannot control how users will use the data once we have sent it to them.

Let us try reducing the access control requirement to something that should still be satisfactory: Access to certain resources should only be allowed for the purpose of satisfying references on pages that we own.

The simplest solution to this new objective is to eliminate the references entirely and send the resources as data URLs embedded in the HTML page. This would satisfy our access control requirement, but it might increase data transfer requirements, as we will have to transmit the resource for every page that the user requests. In many cases, this may not be a problem, if the images are only referenced on one or two pages. But for things like a company logo that appear on every page, the drawbacks are more significant.

If we must use references to the resources that require access control, then we may not be able to enforce the “purpose” part of the access control requirement (technical means can rarely validate client intent), but we can tie references to page accesses. We need to have some way for the client to prove that it received the reference from a page that we own. A one-time token should be sufficient here. While we cannot prevent the client from sharing the token with others, we can at least ensure that the number of requests that we will have to fulfill is no more than the number of requests to the page that provides the reference. Whether the cost of implementing such a system is worth the benefit of preventing hotlinking depends on the situation. But this kind of system is probably best implemented by providing special references in our main page, and not by providing a special token in the referer header. If we tried to use the referer header, we would need to replace the one-time token with a multiple-time token based on the number of resources referenced on the page, which adds significant complexity.