Cache Control

Article by Port80 Software - Page 1 / 2 / 3 / 4 / 5

What Is Caching and How Does It Apply to the Web?

Caching is a well known concept in computer science: when programs continually access the same set of instructions, a massive performance benefit can be realized by storing those instructions in RAM. This prevents the program from having to access the disk thousands or even millions of times during execution by quickly retrieving them from RAM. Caching on the web is similar in that it avoids a roundtrip to the origin web server each time a resource is requested and instead retrieves the file from a local computer's browser cache or a proxy cache closer to the user.

The most commonly encountered caches on the web are the ones found in a user's web browser such as Internet Explorer, Mozilla and Netscape. When a web page, image, or JavaScript file is requested through the browser each one of these resources may be accompanied by HTTP header directives that tell the browser how long the object can be considered fresh, that is for how long the resource can be retrieved directly from the browser cache as opposed to from the origin or proxy server. Since the browser represents the cache closest to the end user it offers the maximum performance benefit whenever content can be stored there.

The Common HTTP Request/Response Cycle

When a browser makes an initial request to a web site - for example, a GET request for a home page - the server normally returns a 200 OK Response Code and the home page. Additionally, the server will return a 200 OK response and the specified object for each of the dependent resources referenced in that page, such as graphics, style sheets, and JavaScript files. Without appropriate freshness headers, a subsequent request for this home page will result in a "conditional" GET request for each resource. This conditional GET validates the freshness of the resource using whatever means it has available for comparison, usually the Last-Modified timestamp for the resource (if Last-Modified is not available, such as with dynamic files, an unconditional GET will be sent that circumvents caching altogether). Through a roundtrip to the server, it will be determined whether the Last-Modified value of the item on the server matches that of the item cached in the browser, and either a fresh version will be returned from the server with a 200 OK response, or a 304 Not-Modified response header will be returned, indicating that the file in the cache is valid and can be used again.

A major downside to validations such as this is that, for most files on the web, it can take almost as long to make the roundtrip with the validation request as it does to actually return the resource from the server. When done unnecessarily this wastes bandwidth and slows page load time. With the dominant Internet Explorer browser under its default settings and without sufficient cache control headers, a conditional GET will be sent at the beginning of each new session. This means that most repeat visitors to a web site or application are needlessly requesting the same resources over and over again.

Fortunately, there is an easy way to eliminate these unnecessary roundtrips to the server. If a freshness indicator is sent with each of the original responses, the browser can pull the files directly from the browser cache on subsequent requests without any need for server validation until the expires time assigned to the object is reached. When a resource is retrieved directly from the user's browser cache, the responsiveness of the web site is dramatically improved, giving the perception of a faster site or application. This cache retrieval behavior can be seen clearly when a user revisits a web page during the same browser session - images render almost instantaneously. When the correct cache control headers are incorporated into a server's responses, this performance improvement and reduction of network traffic will persist for any number of subsequent sessions as long as the object is not expired or until a user manually empties his or her cache.

Somewhere Between a Rotting Fish and Burning Currency

Web caching via HTTP headers is outlined in HTTP/1.1, section 13 of RFC 2616. The language of the specification itself helps to define the desired effects and tradeoffs of caching on the web. The specification offers suggestions for explicit warnings given by a user agent (browser) when it is not making optimal use of caching. For cases in which a browser's settings have been configured to prevent the validation of an expired object, the spec suggests that the user agent could display an image of a stale fish. For cases in which an object is unnecessarily validated (or re-sent, despite its being fresh) the suggested image is of burning currency, so that the user does not inadvertently consume excess resources or suffer from excessive latency.

Cache control headers can be used by a server to provide a browser with the information it needs to accurately handle the resources on a web site. By setting values to prevent the caching of highly volatile objects, one can help to ensure that files are not given the rotting fish treatment. In the case of objects that are by design kept consistent for long periods of time, such as a web site's navigation buttons and logos, setting a lengthy expiration time can avoid the burning currency treatment. Whatever the desired lifetime of an object is, there is no good reason why each resource should not have cache related headers sent along with it on every response.

A variety of cache related headers are superior to the Last-Modified header because they can do more than merely provide a clue about the version of a resource. They can allow administrators to control how often a conditional GET, or validation trip to the web server, occurs. Well written cache control headers help to ensure that a browser always retrieves a fresh resource when necessary or tells the browser how long a validation check can be forestalled. This can dramatically reduce network chatter by eliminating superfluous validations and help keep the pipe clear for essential trips to retrieve dynamic or other files.

The Expires Header (for HTTP 1.0) and the Cache-Control Header with its max-age directive (for HTTP/1.1) allow a web developer to define how long each resource is to remain fresh. The value specifies a period of time after which a browser needs to check back with the server to validate whether or not the object has been updated.

By using a cache control header inspection tool like this Cacheability Engine, it is easy to see by the Last-Modified header value how long many images, style sheets, and JavaScript files go unchanged. Oftentimes, a year or more has passed since certain images were last modified. This means that all validation trips to the web server over the past year for these objects were unnecessary, thus incurring higher bandwidth expenditures and slower page loads for users.

Cache Control Does the Heavy Lifting in Web Server Performance

Formulating a caching policy for a web site - determining which resources should not get cached, which resources should, and for how long - is a vital step in improving site performance. If the resource is a dynamic page that includes time sensitive database queries on each request, then cache control headers should be used to ensure that the page is never cached. If the data in a page is specific to the user or session, but not highly time sensitive, cache control directives can restrict caching to the user's private browser cache. Any other unchanging resources that are accessed multiple times throughout a web site, such as logos or navigational graphics, should be retrieved only once during an initial request and after that validated as infrequently as is possible.

This begs the question: Who knows best what the caching policies should be? On Microsoft IIS Web Servers, currently the web site administrator has sole access to cache control related headers through the MMC control panel, suggesting that the admin is best suited to determine the policies. More often, however, it is the application developer who is most familiar with the resources on the web site, and is thus best qualified to establish and control the policies. Giving developers the ability to set cache control headers for resources across their site using a rules based approach is ideal because it allows them to set appropriate policies for single objects, for a directory of objects, or based on MIME types. An added benefit of developer cache management is less work for site administrators.

CacheRight for IISTry out CacheRight for Microsoft IIS cache control management and centralized cache rules management.

No matter what your web site or application does, or what your network environment includes, performance can be improved by implementing caching policies for the resources on your site. By eliminating unnecessary roundtrips to the server, cache control can go a long way towards achieving the dual objectives of sending as little as possible as infrequently as possible. However, in cases where resources are dynamic or rapidly changing, a roundtrip to the server is unavoidable for each request. In these situations, the payload can still be minimized by using HTTP Compression.

Back to Previous

Article Disclaimer: The SEO Consultants Directory does not endorse the opinions and/or facts expressed by members who provide marketing articles for our site. These search engine optimization and search engine marketing articles are here for you to review and make your own decisions.

If you are a member of the SEO Consultants Directory, you can submit a search engine marketing article for review by following the instructions in our Member Submitted SEO/SEM Articles section.

Back to Previous


SEO Consultants Directory