ASP.NET's Cache Bonus

Overview

ASP.NET supports two important forms of caching: Page (including sub-page) caching, and Data caching.

Page caching allows the server to store a copy of the output of a dynamic page, and to use this to respond to requests instead of running the dynamic code again. Sub-page caching does the same thing, but for parts of pages.

Data caching allows a web application to store commonly used objects so that they are available for use across all pages in the site. While it was possible to do this kind of thing in ASP, using the Application object, it's a whole lot better in ASP.NET.

In this paper I shall assume ASP.NET 1.1, but at the end I will look at changes due in ASP.NET 2.0. All code will also be in C# rather than the other, lesser, .Net languages. (Note: if you're interested in learning C#, try our tutorial).

Page Caching

If you cache a page, you store a copy of the output of the page. So when a user is presented with a cached copy of a page, the code behind the page is not run. This may seem obvious, but it's worth remembering that any code which sets caching options is only run when the page is served up uncached, not when it derives from the cache.

There are three approaches to setting page caching. The first option is to use the declarative @OutputCache directive in the top of the ASP.NET page. The second option is to set cache options via code. The third option is to use meta data attributes in code (but we won't be discussing this third option here).

The declarative @OutputCache directive looks like this, though you wouldn't want to use all of the fields simultanously:

<%@ OutputCache Duration="" Location="" VaryByParam="" VaryByCustom="" VaryByHeader="" %>

We can explain the implication of these settings by considering the answers to the following three questions.

1: Where should the caching take place?
2: How long should the caching last?
3: What variations of the page are to be cached?

Where?

In the normal scheme of things, when you make a request from a website there are two types of cache that may satisfy your request without it ever reaching the origin server. The first is the personal cache maintained by the user's web browser (a 'browser cache'). The second is a shared cache present on the network between the user's browser and the origin server (a 'proxy cache'). The page output cache maintained by ASP.NET on the web server is a further type of cache, but when you specify page caching - by setting the Location parameter - you can also make use of these first two sorts of caches. [Footnote: This actually blurs a distinction between the general caches maintained by the web server IIS, and the caches maintained by ASP.NET proper. But the distinction isn't really that interesting for the developer - if you really feel the need to know about http.sys and kernel-mode caching, look them up on Google.]

The following table shows the possible values of the Location parameter (ie. the values of the OutputCacheLocation enumeration), and the implications of these settings.

Any [default]The page can be cached on any of: the browser; the proxy server; the web server.
ClientThe page is to be cached on the browser.
ClientThe page can be cached on the browser or a proxy server.
NoneThe page is not to be cached.
ServerThe page is to be cached on the web server.
ServerAndClientThe page can be cached on the browser or on the server.

What happens in the background when caching is allowed to occur elsewhere than on the server, is a manipulation of the HTTP 1.1 headers that deal with caching. But we don't need to go into this in any detail here; there's some background information in the section entitled 'Standard Caching over the Internet' at the end of this paper.

How Long?

Pages remain in the cache for some length of time, determined by the Duration parameter. The value is set in seconds.

As we shall see later, if the page caching is set programmatically, rather than declaratively, one can also set a 'Sliding Expiration' value which increases the cache duration every time a page is requested.

What Page Variations?

Sometimes it's inappropriate to cache a single version of a page, in particular when the page needs to respond in different ways to different inputs. A standard example found in the literature is a page which returns flags of different countries depending upon which value is specified in the querystring. So, for instance

http://server/myPage.aspx?Country=US

would give a page showing the US flag, whereas

http://server/myPage.aspx?Country=UK

would give a page showing the UK flag.

Clearly in this case one should not simply cache the first page output - this would mean that whatever page was shown first would be presented for the cache duration, regardless of the querystring value. Instead you'd specify

VaryByParam="Country"

to ensure that pages which differ with regard to the 'country' parameter (which can be passed either in the querystring or as a form variable) are cached independently.

A weakness with this kind of example, however, is that the explicit use of querystrings is much less common in ASP.NET; we generally use web controls to maintain and pass information transparently as form variables. However, for simple web controls such as theTextBox control, the ID of the control is (usually) also the name of the form variable used to post information back to the web page. So if the page contained a TextBox with an ID of 'Country', then the VaryByParam declaration shown above would successfully cache the page.

Note, though, that this kind of trick wouldn't to work with complex web controls such as the Calendar, which uses a more complex postback mechanism. If one wanted to use VaryByParam to cache the current date shown by a calendar (say) then one would probably have to use client-side code to set the value of a hidden form variable on date selection, and cache on the value of the hidden form variable.

Notes on VaryByParam:

- You have to include the VaryByParam parameter as part of the @OutputCache directive, even when you don't want to cache by parameters. But in this case you would set its value to 'none'.

- If you set the value of the VaryByParam parameter to '*' then any difference in querystring or form variables will result in independent caching.

- Specific multiple parameters can be given in a semicolon-delimited list.

The VaryByHeader parameter is similar to the VaryByParam parameter, except that it applies to HTTP header values rather than querystring and form variables. It is, however, an optional part of the @OutputCache directive, so you don't have to worry about setting its value to 'none'.

The VaryByCustom parameter allows closer control over which page varieties are cached independently. It does this by working in conjunction with the GetVaryByCustomString method, which has the following signature and should be overridden in the web application's Global.asax file:

public virtual string GetVaryByCustomString(HttpContext context, string custom);

The idea is that whenever a page is requested, a custom string specified in the directive gets passed to this method along with the page context, and the method returns a string of its own to indicate which page variety it belongs to. The page is then either run, or returned from the cache if another page of its variety has been cached there.

For example, you might have as your version of the GetVaryByCustomString method the following:

1.

public override string GetVaryByCustomString(HttpContext context, string custom)

2.

{

3.

    if (custom=="varyOne")

4.

    {

5.

        if ((string)context.Request.Cookies["isSpecial"]=="true")

6.

        {

7.

            return "special";

8.

        }

9.

        else

10.

        {

11.

        return "normal";

12.

        }

13.

    }

14.

    else if (custom=="varyTwo")

15.

    {

16.

        ...

17.

    }

18.

}


If you then had in your OutputCache directive the entry

VaryByCustom="varyOne"

your page would end up cached in two varieties, depending upon the value of the 'isSpecial' cookie passed in each page request.

Note, though, that you couldn't use this procedure to base caching upon the property of a web control. A decision to return a page from cache comes before and preempts the running of page code.

Setting the OutputCache parameters programmatically

Instead of declaratively including an OutputCache directive, you can programmatically set the page caching options by manipulating the HttpCachePolicy object associated with the page (this is available via Page.Response.Cache. Don't get this confused with the Page.Cache object!) . [Footnote: I have found cases where I've been unable to replicate the behaviour generated by the @OutputCache directive in code, however. I'm not sure if these are bugs or features.]

The benefit of programmatically setting the output cache parameters is that you get a few more options to play around with. In this section we'll concentrate on the extra options, rather than just demonstrating how to emulate the settings in the @OutputCache directive described above.

As we've noted before, you can set up sliding expiration, so that pages which are frequently requested stay in the cache longer. This is useful if you have a limited amount of memory available for caching. The following code gives an example of code which specifies a sliding expiration time of one minute after each request:

1.

public override string GetVaryByCustomString(HttpContext context, string custom)

2.

Response.Cache.SetExpires(DateTime.Now.AddSeconds(60));

3.

Page.Response.Cache.SetSlidingExpiration(true);

4.

Response.Cache.SetCacheability(HttpCacheability.Public);

5.

Response.Cache.SetValidUntilExpires(true);


Note the third and fourth lines of code in the above. The third states that the cache should be Public, which sets the appropriate Cache-control HTTP header in the response. It's not clear why this is necessary, but do it because it appears in the documentation.

The fourth line of code specifies that the cache should ignore attempts by the user to invalidate the ASP.NET cache (by sending appropriate headers in their requests, such as Cache-control:no-cache). This is not set to true by default when the output cache is specified programmatically, although it is when the output cache is specified declaratively.

The second useful thing one can do when setting the output cache details programmatically is to set up a callback function in which a decision can be made, on each page request, to retain or invalidate the cache.

The following example, taken from the MSDN documentation, shows a callback function and the code used to hook it up to the cache:

1.

Response.Cache.AddValidationCallback(new HttpCacheValidateHandler(Validate), null);

2.

public void Validate(HttpContext context, Object data, ref HttpValidationStatus status)

3.

{

4.

    if (context.Request.QueryString["Valid"] == "false")

5.

        status = HttpValidationStatus.Invalid;

6.

    else if (context.Request.QueryString["Valid"] == "ignore")

7.

        status = HttpValidationStatus.IgnoreThisRequest;

8.

    else

9.

        status = HttpValidationStatus.Valid;

10.

}


Here the caching mechanism is set to query the 'Validate' method before deciding if it should return a page from the cache. This method sets the value of the (HttpValidationStatus) status parameter to one of three values of the HttpValidationStatus enumeration. The following table describes the implications of the choice of value:

IgnoreThisRequestThe cache is not invalidated, but the page is executed as normal rather than returned from the cache.
InvalidThe cache is invalidated, and the page is executed as normal.
ValidThe cache is validated, and the page is returned from the cache.

We shall see later that a feature of data caching in ASP.NET is the ability to invalidate a cached object when a file changes. This functionality is missing in page caching, but it can be emulated using the callback function above (because you can programmatically decide to invalidate the cache just in case a cached piece of data is invalidated).

Sub-Page Caching

One of the good features of ASP.NET is that it is object-oriented, and thus gives developers the advantages of encapsulation and reuse of components. One of the component technologies it uses is 'user controls', which have much the same way API as ASP.NET web forms but need to be embedded in web forms to be accessed.

One of the ways in which user controls are similar to ASP.NET web forms is that they support caching. And because user controls are embedded parts of ASP.NET pages, this means that they can be used to implement sub-page caching - the caching of parts of a page whilst the other parts are dynamically and programmatically generated in the usual way.

Caching for user controls can be specified declaratively, using the @OutputCache directive, but there are some differences when this directive is used for user controls. The acceptable elements in this are as follow:

<%@ OutputCache Duration="" VaryByParam="" VaryByControl="" Shared="" %>

Notice that we've lost Location, VaryByCustom and VaryByHeader. On the other hand, we've gained VaryByControl and Shared. So let's see what they do.

To explain VaryByControl we'll need to first look at a background idea about control naming.

When you embed a user control in a web form, ASP.NET automatically takes care of potential naming conflicts. For instance, suppose that you have a user control that contains a DropDownList called 'Countries', and you embed this user control - which itself has the ID of 'MyUserControl' - on a web form. Because the web form might already have a control with an ID of 'Countries', the DropDownList contained in the user control is given a name based upon its position in the page hierarchy: in the example case it would be something like 'MyUserControl:Countries'.

When a simple web control is given a 'compound' name of type just described, the form variable that gets passed as its value also has a compound name (in the case of the above, it would be MyUserControl_Countries). And what this means is that specifying a VaryByParam element in a user control's OutputCache directive is a tricky matter; the parameter names that get passed will vary dependant upon the context the user control finds itself in.

The point of the VaryByControl attribute is to fix this problem with VaryByParam. If you specify the name of a control under VaryByControl then the caching engine will examine the appropriate parameter for that control, ignoring any changes due to differences of context.

The Shared attribute takes note of the fact that a common use of user controls is to repeat content across multiple web pages. By default, when you set a user control to be cached, it is cached only relative only to the page in which it is embedded. So, for instance, if user control U is embedded in pages P1 and P2, then the caching of U will proceed independently for P1 and P2; the first request to P1 and P2 will each compile U. But if you set the value of the Shared attribute to 'true', then U will be cached regardless of the page in which it is embedded. In this scenario, the first request to P2 could return U from the cache if a previous request has been made to P1, and had resulted in U being placed into the cache.

One issue we should note here is that when a user control is served up from the cache, it is not available for programmatic access - not even read-only access - from any other object. If an attempt is made to access the object in this circumstance, a run-time error occurs. Any code that interacts with a user control that may be cached, should therefore take care to check for its existence before accessing its members. Pseudocode for this check goes something like this:

1.

if (myUserControl == null)

2.

    {...ignore myUserControl...}

3.

else

4.

    {...do something with myUserControl...}


Finally, a couple of interesting questions that I don't know the answer to: What determines the size of the output cache? How does ASP.NET go about invalidating cache entries when memory runs low?

Data Caching

Data caching is the storing of data internal to a web application, such that different pages can access the data. This is typically a good idea where one's application needs to be able to present a static set of data in lots of interesting ways (thus making it difficult to use output caching), but where the overheads involved in collecting the data are high.

In legacy ASP, it was possible to store data in the Application object, but there were limitations on what kinds of data you could store, and the whole thing felt somewhat flaky. ASP.NET resolves these limitations, however, and also adds various useful data caching features.

As with legacy ASP, however, it is a mistake to cache database connections. There is a separate process which maintains a pool of database connections, collecting them for reuse when they are disposed of, and a developer shouldn't attempt to pre-empt this with caching.

Simple Data Caching

The data cache is available through the Cache property of either the Page or the HttpContext class. There is one cache per 'application domain' (which for most purposes just maps to an application). It presents as an associative object array, so you can write to or read from it in the following simple ways:

1.

Cache["myKey"] = cachedObject;

2.

object ob = Cache["myKey"];


Of course, if you were going to use such lines of code in a useful code block, it would look something like the following, where you initially test for the existence of the cache object:

1.

if (Cache["myKey"] == null)

2.

{

3.

    Cache["myKey"] = GetBigObject();

4.

}

5.

object ob = Cache["myKey"];


Note that it's important that this test be conducted each time that the cached object is accessed, because the ASP.NET data caching engine is capable of dropping objects from the cache when memory is running low, or when more important bits of data are added, or whenever it otherwise gets the urge.

To add objects to the data cache in the way shown above misses out on lots of the extra goodies associated with data caching, though. When you add an item to the cache using the Cache.Add or Cache.Insert methods (which are equivalent except that Cache.Add returns the object that gets added and Cache.Insert doesn't), you can set up expiration policies (both absolute dates and sliding), a priority level, a callback method for when the item is removed from the cache, and various 'cache dependencies'. Below we'll look at the last three of these.

Priority Levels

The priority level of a cache datum is set by a value of the CacheItemPriority enumeration. The possible values of this enumeration, in rough order of importance, are:

NotRemovable
High
AboveNormal
Normal
Default (=Normal)
BelowNormal
Low

When the Server comes to free up memory from the cache, it prefers those nearer the bottom of the list (I don't know the exact details of the algorithm that's used, whether it gives any weighting to sizes of objects etc, but I don't think that it's enormously important to know this).

Callback Method

It is possible to set up a callback method with the following signature, to respond to the event of an object being removed from the data cache:

public void CacheItemRemovedCallback(string key, object value,CacheItemRemovedReason reason);

I don't know in what circumstances one would make use of this functionality, though.

Cache Dependencies

Suppose that you have a local XML file which contains data, and you don't want every page to have to read the data anew. What you might do is place the data into the data cache using the kind of conditional procedure described above. But suppose further that your XML file was occasionally updated. This would pose you a dilemma - either accept a latency period in which the site is using old data, or else reload the XML data into the cache more frequently than you would like.

Cache dependencies offer a way out of this dilemma. For it is possible to make an object drop from the cache just in case a particular file is amended or deleted. Setting up such a cache dependency would mean that you could remove any problematic latency period, but still only have to reload the XML data when it is changed.

In fact, there are three different types of objects that you can set up cache dependencies on: files, directories, or other cached objects (and a single object can have multiple dependencies). Where a cached object is set to depend upon a file or a key, the situations in which it is invalidated are straightforward: when the file or key is amended or deleted. But the documentation doesn't make it clear exactly what dependency on a directory comprises, so I took it upon myself to test this. The result of this testing (on a Windows XP box) suggests that when a cache object is dependent upon a directory, it is invalidated just when:

i. the directory is deleted or renamed (although the CacheDependency object may make this difficult by acquiring a lock on the directory); or

ii. any file directly within the directory is renamed, created, deleted, or altered; or

iii. any object within a direct subdirectory of the directory is created or deleted.

So now you know.

ASP.NET 2.0

ASP.NET 2.0 (codenamed Whidbey) is the next version of ASP.NET to be released. According to the pre-release documentation, there are two significant updates to caching.

Firstly, it will be possible to set up a cache dependency (both for output caches and data caches) on database entries. This is clearly a natural progression from basing cache depencies on files - in the example case, the data picked up from the XML file could well have been stored in a database. According to the reports, SQL Server 7.0 and 2000 will only support table-level dependencies (ie. the dependency will only be responsive to table-level events), but subsequent versions of SQL Server will support row-level dependencies.

The second change is that it will be possible to write custom dependency classes. This will allow cache dependency to be extended into new areas, guaranteeing excitement and fun for years to come.

Appendix 1: Standard Caching over the Internet

On the standard model of content provision, a browser makes a request for an object (a web page, or image, say). This request goes to the place where the authoritative versions of these objects are stored - the origin server - which satisfies the request.

A cache sits on the route from the browser to the origin server, and tries to satisfy requests without bothering the origin server. Ideally, this both saves bandwidth and improves latency (the time taken to satisfy a request).

It is useful to distinguish between a browser cache, which sits on the user's computer, and a proxy cache, which sits on a network between the user's computer and the origin server. Generally, a proxy cache will satisfy requests from multiple users, and cache objects from multiple servers. In addition, the term 'reverse proxy cache' (or similar) is sometimes used for a proxy cache that sits directly in front of an origin server and caches objects only from it.

Which objects are cached, and for how long, is determined both by rules associated with caches themselves, and rules associated with the HTTP protocol. There are two important concepts to understand:

Freshness: a cached object is considered to be fresh if it is deemed acceptable to be sent to the requesting user without checking with the origin server. When a cached object is not fresh, it is described as stale.

Validation: where an object is stale, the origin server may be asked to OK the cached object without having to resend it; this is validation. It can be appreciated that validating an object will usually take less time and fewer resources than resending it.

HTTP 1.0 contains headers which exert some control over traffic moving between the browser and the origin server. The following headers are amongst those used in the protocol:

Date: the creation date of the object, to the granularity of a second.

Expires: allows the server to specify a date/time after which it is considered to be stale.

If-Modified-Since: sent by the browser, this allows the server to return a validation response (a 304 header) rather than returning the requested document.

Pragma: no-cache: sent by the browser, this should force all proxies to pass through the request to the origin server, even if a fresh copy is available.

HTTP 1.1 contains many more headers which allow a more fine-grained control over caching. Here we note just a few.

ETag: this is used in conjunction with other headers, and allows for the server to set a unique identifier on new versions of files. It is useful in cases where pages change rapidly, and the Date header (which is only granular to seconds) is unable to distinguish between different versions.

Cache-control: this is the main tag used to control caching. When embedded in a request, it can specify that the request should not be cached or stored, that the user is willing to accept stale pages, that the user is willing to accept pages that have been fresh for a fixed length of time, etc. When embedded in a response, it can override the standard caching policies of a cach, by requiring that the file be cached or not, or that the file must be revalidated even if the user is willing to receive stale pages, etc.

Link Building Information