Friday, September 29, 2006

Injecting XML input into XQuery using Spring

I recently went to a XML Access Languages event held jointly by W3C and xmluk.org. Presentations centered on XQuery, XSLT 2.0, XPath 2.0 and SPARQL. The whole event was tremendously interesting and I will probably blog further about this at a latter date.

Liam Quin gave a very enjoyable talk on XQuery which specifically caught my eye. Michael Kay (of SAXONICA) also spoke about the relationship of XQuery to XSLT 2.0, XPath 2.0 and XML Schema so I feel particularly well informed on the subject now (there is nothing like hearing it from the horse's mouth!). XQuery looks a bit like a hybrid of SQL and XPath (with FLWOR [For-Let-Where-Order-Return] syntax thrown in) and is particularly useful for accessing XML data across disparate sources.

Following the conference I have been doing some experiments using SAXON, starting with trying out the examples in Bob DuCharme's XML.com article Getting Started with XQuery.

XQuery 1.0 and XSLT 2.0 both support's XPath 2.0's document() and collection() functions for accessing external input XML documents. These are potentially extremely powerful facilities. Having recent experience with the Spring Framework, this way of doing things was a concern to me. XQuery et al appeared to be, at first glance anyway, advocating closely coupling documents with processing. This flies in the face of the inversion of control (dependency injection) design pattern which I have learnt to love.

To further illustrate my point lets say I have some XML source documents that I want to perform some XQuery on:

  • they might be on the filesystem
  • they might be in an XML database
  • they might be in a CLOB on a relational database
  • they might be on the web accessed via an URI
  • they might have been returned via Web Service
  • they might be DSML format returned from an LDAP server
  • they might be from some combination of the above, I could go on...

The document could be coming from almost anywhere and therefore would need to be accessed using very different mechanisms depending on the situation. Does that mean we need as many XQuery implementations as there are access mechanisms? I would hope not. That said, this seems to be the current situation where multiple XML database vendors supplying their own implementations of XQuery for their particular databases. This is wrong surely? I would argue that where the source XML originates is none of the XQuery processors' business and arguably the precise source should not even be detectable from the URI!

SAXON (the free version) includes support for XQuery. The SAXON XQuery processor has native support for accessing XML documents from the filesystem but as an experiment I'd thought it would be good to see if I could make use of the Spring Framework to feed the SAXON XQuery processor.

There is already a Spring XML Database Framework that enables Spring to access eXist and Apache Xindice XML database datasources to interact with XQuery but it looked a little complicated for my needs.

I discovered that the URI in the document() and collection() functions is merely a reference to an external document, it is not necessary that this should imply a specific access mechanism. In order to fool SAXON into accepting my Spring accessed input data I discovered that all I needed to do was to implement a Spring aware URIResolver and CollectionURIResolver. I could then configure SAXON to use those resolvers to access the documents and collections referenced in the XQueries.

What follows is by no means full featured (it is hard wired to read from a Spring resource) but it could be extended to support multiple data access mechanisms. I achieved my ends via two fairly simple java beans and a test program.

SpringXQuery performs the XQuery itself.
It is responsible for loading XQuery query file (using the Spring resource loader).
It is used to configure the collection and document URI resolvers.

SpringURIResolver provides the URI resolution.
You can configure a map of collections to use, using a map of maps.
You can configure a map of documents.

Other files are:

Spring's application context configuration file
A simple test program

and also the XQuery files and example XML files used in Bob DuCharme's XQuery article.

Incidentally I made use of Spring's MapFactoryBean in order to make my Spring configuration a little bit cleaner. I also made use of a tip I found, Spring: Locating Application Relative Resources, to ensure the Spring resource loader works.

Surprisingly enough it works; this is despite the fact that I do not fully understand all the intricacies of what I am implementing!

It looks to me that when using XQuery across multiple datasources performance is always likely to be an issue and this partly explains why all these database vendors have vendor specific implementations. I might argue that performance issues should be addressed on a separate level and, IMHO, is not a sufficient argument for re-implementing an entire language! Roll on javax.xml.xquery...

Saturday, September 02, 2006

Using XSLT 2.0 to emulate IE7 feed reader appearance (including filter by category, date and title sorting)

I have managed to create an XSLT 2.0 stylesheet that emulates the appearance of IE7's feed reader including sorting functionality.

To see the stylesheet in action click here

Download it here

As with my XSLT 2.0 tagcloud experiment, I have again made use of W3's Online XSLT 2.0 Service and I have again used Microsoft RSS 2.0 feed Recently Added and Updated Feeds (oh the irony!).

The hard bit was sorting by RFC 822 dates. I had to use <xsl:analyze-string> to convert the RFC 822 date string into a sortable format. I found lots of information about how to do this on Dave Pawson's XSLT 2.0 site (date processing, regular expressions) plus finding something similar to the regular expression that matches RFC 822 dates helped a lot (Jorgen Thelin's blog entry containing the helpful regexp).

The XSL stylesheet and CSS are by no means perfect, they currently only work with RSS 2.0 feeds that contain RFC 822 format dates and even then they would benefit from some serious refactoring BUT I'm really pleased with the result, I really should get out more....

Friday, September 01, 2006

Web 2.0 needs online XSLT tranformation engines and XSLT 2.0 generated tagclouds

XSLT 2.0 stylesheet that produces a tag cloud

A few weeks ago I produced an XSL stylesheet that could produce a tag cloud from an RSS 2.0 or Atom feed. This made use of a technique called Muenchian Method of grouping (named after Oracle man Steve Muench). I had read that XSLT 2.0 contained native grouping functionality, (which should be easier to understand), I thought I'd investigate producing a tag cloud with an XSLT 2.0 stylesheet. For some reason, Xalan, my favourite XSLT processor does not yet properly support XSLT 2.0 therefore I had to use Saxon to do the XSLT 2.0 processing. I discovered a, servlet based, demonstration Online XSLT 2.0 Service hosted at W3.org (which also uses Saxon).

Click here for the XSLT 2.0 stylesheet I have written that produces a tagcloud. It makes use of XSLT 2.0's <xsl:for-each-group> element instead of the Muenchian Method. The iframe below should show a tag cloud that is the result of an XSLT transformation of the Recently Added and Updated Feeds from Microsoft RSS 2.0 feed, making use of W3's online XSLT 2.0 service. [Incidentally, the online service also supports passing parameters into the XSL transformation].

Why did I do this?

I thought that using an online XSLT transformation engine would be a neat way to produce tag clouds and such for people using free hosting services like Google's blogger.com. I was thinking that I could host the XSL on Google Pages. The GData powered Blogger Data API is reported to support entry categories, unfortunately I have not yet got this to work properly yet. In fact it, worse than that, it killed the test blog that I was experimenting with, I now get We're sorry, but we were unable to complete your request.

Why Web 2.0 needs free online XSLT transformation engine services

You get the idea by now, if we make use of online XSLT transformation services and free hosting services which produce XML we can really start to use the web as a platform. It is nice to have your own server to tinker with but I would argue that it should not be necessary in the age of Web 2.0.

What is great about all this "Web 2.0" stuff, is that we already have all we need to accomplish it. We do not need to wait for any new technologies, it is already here, and we just need reliable services to create new ways to make use of the web. I think that it would be great if Google or Yahoo or somebody hosted a free, high performance online XSLT transformation engine. Blimey, they could even advertise on the front page and I wouldn't care!

Granted, my XSLT tagcloud example might not have brought you around to my way of thinking yet so here is another powerful example where an online XSLT transformation engine would be superb.

Everybody loves AJAX at the moment but there are those painful same domain XMLHttpRequest problems that could require the use of an application proxy and have made On-Demand Javascript and JSON so popular (as used in Yahoo's JSON callback technique). [Incidentally, Google's AJAX Web Search also uses this technique; I will speak no more of this in case I get in trouble ;)].

So you want to write a super duper, AJAX application and host it on a free service. HTML, CSS and JavaScript can be hosted anywhere but how do we get around those pesky XMLHttpRequest problems if we are relying on free hosted services? This is where an online XSLT transformation engine would come in very handy. So you want to process some external XML but it isn't available in JSON format? The answer is transform the XML into JSON!

I found an XSL stylesheet that could convert XML into JSON on the eBay developer site. eBay even host an online XSLT service but it is too restrictive to use freely.

Hosting an online transformation engine would be a very good way for a company to showcase their XSL processing hardware (hint, hint, IBM please take note).

Now I know what we need, it is quite frustrating that it isn't already available. If you know different and can tell me where I can access a free, high performance, unrestricted, reliable, online XSLT processor engine please let me know!