Tuesday, August 08, 2006

XSLT generated Tag clouds (inspired by IE7Beta3)

I recently installed IE7Beta3 (I have only recently upgraded my home PC to XP). Straight away I was drawn to the feed aggregator, which is now integral to the browser. I was very impressed; I like the sleek styling. The aggregator part is not perfect by any means, managing feeds looks like a bit of a pain and it wouldn't install a valid OPML feed list that I had.

See Internet Explorer 7's superior feed handling for an overview, with pictures, of IE7Beta3's feed aggregator.

I am slightly concerned that the feed reader looks so much like a web page, this hides the complexity of RSS platform from the user but at the same time it could make you think that RSS should always behave like this. What IE7Beta3 is doing is a little bit more complicated than your average XSL transformation of RSS feed.

The Filter by category features looks very nice. Then it stuck me, the Filter by category part of the page is a variant of a tag cloud. Only feeds that support the <category> elements can be rendered like this, as far as I am aware that limits this behaviour to RSS 2.0 and Atom.

As with any fancy interface I started to think how it works and I thought I'd have a stab at producing a tag cloud using XSLT alone (I had a quick look and couldn't find anybody else doing exactly this on the web).

My XSLT makes use of something called the Muenchian Method (me neither!). I found this described in the Grouping and Counting sections of Dave Pawson's XSLT Questions and Answers. It turns out that the Muenchian Method isn't actually that complicated once you get started (you have to be a little careful with sorting). So a little time later I have my XSL that can transform an Atom or RSS 2.0 feed (containing category elements) into a tag cloud.

Here is the XSLT that creates tag clouds from RSS 2.0 and Atom feeds

Here is a tag cloud that I created from the complete Atom feed generated by my blog
Here is a tag cloud that I created from an RSS 2.0 feed called Recently Added and Updated Feeds from Microsoft (I think this comes pre-installed with IE7Beta3)

The CSS for the tag cloud was stolen from How to Make a Tag Cloud for Movable Type Blogs.

The next steps in implementing the IE7Beta3 style interface would be the sort by title and date functions. Sorting by date would probably be easier with Atom feeds. The Atom date format is very simple. With the RSS 2.0 feeds you can't guarantee the date format pattern you will get, this makes sorting more of a challenge. Filtering and sorting simple stuff is relatively easy to do with XSLT.

What stops this XSLT from being run inside the client browser is two things.

  • Obtaining remote feeds

    You probably need to be able to run the XSLT against XML feeds obtained from a remote source. Therefore we need a mechanism to obtain or proxy the feed. Also, if you are obtaining remote feeds it would be polite to use the conditional get mechanism if possible. This suggests a server side implementation, maybe using the ROME fetcher.
  • Passing parameters back into the XSLT

    In order to initiate the Filter by category, Sort by date, Sort by title behaviour we need to be able to pass parameters back into the XSLT. Passing parameters into a XSL stylesheet in the client browser is a fairly nightmarish prospect. Again, this suggests a server side application would be best.

Having worked out how to create the Filter by category "tag cloud" I don't think it would be too hard to create a facsimile of the IE7Beta3 feeds interface using JSTL or a servlet.