Monday, June 27, 2005

The Marvelous Conditional Get

Consuming content with caching

I am going to highlight two ways to take advantage of HTTP 1.1 caching mechanisms for content consumption. Caching can significantly improve performance and can reduce the network bandwidth load on the content provider server. I think it is only good etiquette to use caching where it is possible in your applications. Static web documents (including HTML, CSS, JavaScript, XML, XSL) are usually cached and can be retrieved as such. I found both these techniques when I was writing my newsfeed aggregator application.

Using URLConnection with a Proxy Server

The first method takes advantage of an external web proxy cache server (such as those powered by Squid). It is possible to set the environmental variable for the JVM (-Dhttp.proxyHost) so that all the content is obtained through a web proxy cache but this is not always the behaviour you require. The document describing this feature said that it is not well documented, and it still isn't, that is why I am reproducing it here.

You can read the original document at Using URLConnection with a Proxy Server.

Essentially it boils down to instead of the usual:


URL url = new URL("http://www.javablogs.com/ViewDaysBlogs.action?view=rss");
URLConnection c = url.openConnection();

You can do the following and it will use the web proxy cache as an intermediary. This will reduce the load on the destination server if the document is static (or is dynamically generated but supports caching) and if the web proxy cache server used is located closer to the application's server then this will also improve the response times.


URL url = new URL(
"http",
// protocol
"myProxy.com",
// host name or IP of proxy server to use
-1,
// proxy port or -1 to indicate the default port for the protocol
"http://www.javablogs.com/ViewDaysBlogs.action?view=rss");
// the original URL, specified in the "file" parameter

ROME sub-project: RomeFetcher

In my newsfeed aggregator I initially used the above technique to obtain syndicated feeds. I then discovered I could use RomeFetcher which is specifically designed to retrieve newsfeeds and support the HTTP 1.1 Conditional Get mechanism (ie: last modified and ETag handling). Example code can be found on the RomeFetcher site. The Jakarta Commons FeedParser looks like it will also support the conditional get mechanism but there have been no releases of this as yet.

The conditional get mechanism is comprehensively explained in HTTP Conditional Get for RSS Hackers.

Producing cacheable content

The two methods above take advantage of caching for content consumption. Most modern web servers will serve up static documents and intrinsically support the caching mechanism. The very nature of dynamically generated documents means that they do not support caching (unless explicitly written to do so). Depending on the generated content characteristics this can sometimes be remedied.

If your content is updated periodically then it is relatively easy to modify the servlet code to support caching. In a servlet it is simply a case of implementing the getLastModified method as appropriate to the content.

See e.g.: Utilizing browser and proxy server cache

1 comments:

Mark McLaren said...

Actually, I had a "penny dropped" moment. It is really easy to produce content that supports the conditional get. All you need is to install a cache servlet filter.





e.g.:



Cache Filter

or Two Servlet Filters Every Web Application Should Have
Note: Comment imported. Original by markmc website: http://content.mark-mclaren.info/ at 2005-07-21 21:45