site logoTune The Web
I've written a book! - click here to view or buy "HTTP/2 in Action" from Manning. Use code 39pollard to get 39% off!

Entity Tags (Etags)

This page was originally created on and last edited on .

Introduction

Entity Tags, or Etags, are another way of handling 304 responses. 304 responses are a way of the web server telling the browser that the version it has in the cache is the latest version, even though the cache time may be expired. In this case there is no need to download the file again wasting time and bandwidth.

The way it works is that when a browser requests a page, which they have already downloaded in the past, the browsers sends a HTTP request header with the cache time in the If-Modified-Since header, for example

If-Modified-Since:Tue, 01 Sep 2015 11:28:37 GMT

This basically says "Hey web server, I have a copy of this file I'm asking from you, but it's from 1st September 2015 at 11:28zm - is that alright to use or is there a newer version of this page available?" If there's a newer file, the web server will you send the browser that one, and if not then just tell me to continue using the one I've got". A 304 response is the way the web server tells your web browser to keep using the file. This also sets up a new expiry, based on your caching settings so the web browser won't check again until the cache-expiry time is up.

This all works very well, but the timestamp isn't a very good way of checking if the file has changed. There are a few instances where the file timestamp cannot change (e.g. if the files are put on the web server and they are not able to set the timestamp during to OS restrictions), may have changed even if the resource hasn't (e.g. if you ave two web servers with a load balancer in front of them), or even be older (e.g. if you restore a file from backup).

ETags, instead try to give a better idea as to whether the file is fresh, which is not just based on the timestamp of the file. Exactly what the ETag is based on is up to each website, and the web browser doesn't need to know how this is calculated. This could be a checksum of the file contents, or a check that if the file or the size has changed, or perhaps even just the last modified date (in which case there is no real difference between using ETags or normal "If-Modified-Since" header).

How to set it up

ETags are configured in your web server. In Apache this is as simple as adding the following config:

FileETag MTime Size

This tells Apache to send ETags based on a combination of modified time and size. Now when the web browser requests a file from the web server, that is already has a copy of, it will send two tags:

If-Modified-Since:Tue, 01 Sep 2015 01:28:37 GMT If-None-Match:"2e92-51ea5781a9b40-gzip"

The If-Modified-Since header is the usual timestamp based header, and the If-None-Match is set to the ETag value. When a If-None-Match header exists, it is used in preference to If-Modified-Since, so the ETag is checked, against the ETag of the file on the server. If they are the same, then the 304 is returned, otherwise the whole file is returned.

Support

ETags are part of the HTTP/1.1 specification so enjoy widespread support on web browsers and on web servers. However the usefulness of ETags is dependent on the implementation and this has some issues (see downsides below).

The Downsides

The whole point of using ETags is they are supposed to give us a more accurate indication of whether a file has changed rather than just basing it on timestamp. This sounds great in principal, however the way they have been implemented often hugely reduces their effectiveness.

Original implementations on Apache used to default this to be based on inode, which is basically a link to the physical file on the server which means if you have two or more load-balanced web servers, then the ETag was different for each server. This meant that if, for some reason, you ended up being bounced back and forth between several servers then you got different ETags and so would downloaded the file even if it had not changed. This is no longer relevant as the defaults changed a while ago, but you still stumble across blog sites recommending not to use ETags for this reason, when that's not true - there are much better reasons not to use ETags! :-)

The first thing to be aware of is that ETags should be based on the file delivered and so if a file is gzipped or not, it should return a different ETag. The reality is that so many browsers accept gzip this is unlikely to be an issue but it has caused some other problems as a side effect. Apache tried to fix ETags to work properly with gzip and proxies in 2008 and instead introduced a bug which has never been fixed since! This basically stops 304 responses for gzipped contents so you have the choice of 304 responses based on ETags occasionally, or gzip all the time. As you definitely want gzip this effectively equates to killing ETags for Apache users. Apache 2.2 and 2.5 added a new DeflateAlterETag Directive to attempt to address the issue but it's a bit of hack, doesn't fully address the issue, and isn't supported in Apache 2.4. So net result is: don't use ETags if using Apache. I see regular posts on stackoverflow.com and serverfault.com in regards to this bug as people struggle to understand why 304 responses are not being used. Try it yourself if running Apache 2.4 with Etags: load a page on your browser (www.apache.org being a good example), open developer tools, reload the page suing F5 (making sure the the Disable Cache option is turned off). Gzipped content will incorrectly load with a 200 response, but your non-gzipped contents (e.g. images) will correctly be used from cache and return a 304 response:

Apache Bug when using ETag and GZip

Even if Apache didn't have above confusing bug, I'm not convinced they are so great anyway. Ideally an Etag would tell you if the file contents are different. Apache allows you to create an ETag based on a combination or one or more of: inode, timestamp or file size. Now we know inode is a bad idea, so that leaves us with timestamp and/or file size. Doing both is probably the best, so any change generates a new ETag - but then still leaves you in cases where the file timestamp changes but the contents don't, which could be fairly frequent depending on your release process to your web server. So you don't really gain that much in using ETags, compared with the standard If-Modified-Since headers (which are based on timestamp anyway). Ideally they would be based on a version number (which is difficult to configure at web server level) or a file checksum (though I do appreciate there could be performance impacts of this if the web server had to calculate a checksum for each file served, rather than just read standard file descriptor data as current ETag implementations do), but unless the web server implements that, there are only negligible benefits to Etags and quite a lot of downsides based on implementation issues and bugs.

Incidently in case you think I'm Apache bashing nginx has the same file sie and timestamp implementation and just do a Google for "nginx etag proxy gzip" and you'll see it has it's own issues and confusion there too (though for a different reason to above and to do with Nginx working in reverse proxy mode - but still, it's leading to confusion). And IIS also has a confusing ETag implementation (or at least used to according to that post and others ).

Summary

ETags are a great idea in theory, and you'll find many performance recommendations to turn them on, but I'm going to go against the tide and say they are not worth the hassle in the current implementations used by web browsers and advise you turn them off. The default time-based If-Modified-Since headers are almost as good, and don't suffer from above problems.

Also 304 responses are not that huge a performance benefit, compared to regular caching. They are useful when people explicitly reload a page and don't need to reload assets (which I think people will be more forgiving for slowness), or when an asset is still in the cache, but expired (which may not be that frequently depending on how the web browser handles it's cache). They also still require a network request and response - which, unless it's a large file, will be almost as long to get a 304 response as to get the full 200 download.

So, all in all, I'm going to go with recommending not to use ETags. Let me know below if you agree or disagree.

This page was originally created on and last edited on .

How useful was this page?
Loading interactions…