Sunday, December 23, 2007

Sparklines (Google Analytics Style) with Google Chart API

At the beginning of December Google released their Google Chart API which lets you dynamically generate charts. I did some initial experiments with extended encoding rendering Sine and Cosine waves a few weeks ago (see also).

I recently logged into Google Analytics and wondered how they achieve their graphs. On the Google Analtics reports there are a fair few Flash components but some graphs are generated in a very similar way to the Google Chart API, specifically the sparklines. Sparklines (as developed by Edward Tufte) are data-intense, design-simple, word-sized graphics.

A Google Analytics sparkline:


https://www.google.com/analytics/reporting/sparkline?p=vvt1f3iczaycs7vjs6eahxyauuu0znswgdiuzzv8sauimebsc1g9faeggabx8j

I recognised that Google Analytics URL is using something very similar (if not identical) to extended encoding. Out curiousity I wanted to see if I could recreate my Google Analytics sparklines using Google's Chart API - Tim Shadel is similarly interested in this topic.

From the Google Chart API Google group I found a post by Uwe Maurer (of Google Z├╝rich) concerning the chart API's future support for sparklines.

I found a blog entry on the topic by Deepak Jois and this lead me to some more information about an undocumented API feature that allows you to turn off axes (axes being the plural of axis, apparently axes is a heteronym!). This facility is used by Google Finance.

  1. Test to see if the Google Analytics URL is using extended encoding. Success!! As you can see there is an errouneous point at the end of the graph. In the Google Chart API the length of an extended encoded string should be a even number so I removed the last character of the Google Analytics encoded data string (which had an odd length) it appears to work.

    http://chart.apis.google.com/chart?cht=lc&chs=100x50&chd=e:vvt1f3iczaycs7vjs6eahxyauuu0znswgdiuzzv8sauimebsc1g9faeggabx8j
  2. Using the Google Chart API as documented and Uwe's workaround for turning the axes off:


    http://chart.apis.google.com/chart?chxl=0:||1:|&chxs=0,000000,10,0,_|1,000000,10,0,_&cht=lc&chxt=x,y&chs=100x50&chco=0077CC&chm=B,E6F2FA,0,0,0&chd=e:vvt1f3iczaycs7vjs6eahxyauuu0znswgdiuzzv8sauimebsc1g9faeggabx8

  3. Using the undocumented Google Chart API feature as used in Google Finance:


    http://chart.apis.google.com/chart?cht=lfi&chs=75x30&chco=0077CC&chm=B,E6F2FA,0,0,0&chd=e:vvt1f3iczaycs7vjs6eahxyauuu0znswgdiuzzv8sauimebsc1g9faeggabx8

As you can see the third option produces a sparkline pretty close to what Google Analytics produces. It looks like it would be possible to clone the whole of the Google Analytics dashboard report using the Google Chart API (could be a fun experiment!).

Thursday, July 19, 2007

GME, Google Calendar to hCalendar format mashup

I have created another GME mashup. As with my GeoURL/Google Maps GME mashup this does not make use of GME in what I would expect to be a typical way. Due to the buzzword friendly technologies involved I would hope this mashup would score highly on the geek appeal scale!

http://hcalendar.googlemashups.com/

Wednesday, July 18, 2007

A GME, GeoURL and Google Maps mashup

I recently started playing around with that Google Mashup Editor (GME). In my last blog entry I promised to share the experience of my first mashup, so here is my account.

Further information about Google Mashup Editor (GME)

The Katamari Framework was Google's internal codename for the Google Mashup Editor. In essence, GME is an application server extension to the Google Web Toolkit. I also recently discovered that the attractive syntax highlighting in the Web IDE is provided courtesy of the Open Source Codepress project.

The analogy between Tomcat and the GME platform continues. Where JSP pages have application and session scope, GME has two built-in data feeds (${app}, ${user}) that allow users to read and write data specific to the application or to the user.

My GME mashup: GeoURL to Google Maps

http://geourl.googlemashups.com/

The source code for this mashup is available here and here.

My application would probably be considered "very advanced" rather than an exemplar of typical GME usage. What follows are some developer notes concerning the JavaScript approaches I took with this mashup, I hope these insights are useful to somebody. There is always room for improvement but I am happy with my mashup and think it works well enough.

The GME makes it very easy to produce certain types of application, especially when interfacing with Google's own applications. I have found that GME can be used to host very generic applications, these need not make any use of Google's applications (e.g. you could use Yahoo Maps for example and you get bonus points for using MapStraction!).

My intention was to recreate my ancient GeoURL to Google Maps experiment but host nothing myself with everything necessary running on Google's server.

On GME what makes generic application development possible is that GME can act as an application proxy for RSS/Atom content and you can host JavaScript on it. So if you can acquire your data in RSS/Atom or as XML/JSON using a Web API (e.g. via on demand callback methods) then you can host an application that uses that data on GME. There are many, many Web APIs out there for you to play with (See programmable web apis).

As I had done previously, I wanted to create a streaming display of GeoURL sites as I moved around a Google map.

My initial attempt only used GME tag functionality. This mashup didn't really do what I wanted it to very well and I felt a little restricted in controlling how it behaved so dispensed with using the gm:map tag in favour of hand coding against the Google Maps API myself.

Submission throttling

Rather than attempt to access the GeoURL feed at the end of every mouse move (as is usual with Google Maps stuff), I chose to use the submission throttling technique of only attempting to fetch new feed data after every 4 seconds. This should reduce the load on the backend feed, reduce client side processing and ultimately improve the user experience.

BitPerfects Google Maps "getOverlays" extension

I have methods that add markers and methods that remove markers. Removing markers from Google Maps is a little difficult as by default there is no way to get an array of all the visible markers. I experimented with BitPerfects extension to the Google Maps API that allows better management of markers but I was still having problems because I had methods simultaneously trying to add and delete markers to my array.

Ensuring mutual exclusive array access

My problem is that sometimes I am trying to add/delete and iterate through an array simultaneously. What I needed was some sort of mechanism by which only one process is allowed to act on my array at any one time. I first experimented with a mutually exclusive (mutex for short) access mechanism that implements something called the Wallace Variation to Lamport's bakery algorithm by Bruce Wallace (basically, you take a number and wait in line). This looked just the ticket (groan!) but I figured that for my mashup I didn't need to maintain a queue of waiting methods. In my mashup it would be sufficient to allow the most recent request to take precedence and throw away any previous requests. So inspired by Bruce's mutex algorithm I implemented something that dynamically defines a function and runs it. That function could be changed by each of the various array accessing processes, the last method to redefine the function is the one that actually gets called (assuming it is not already running).

Feedback welcome

Friday, July 13, 2007

Google Mashups Editor: The Google Application Server

I have recently been playing around with the Google Mashup Editor (GME). Like all good Web2.0 products GME is in Beta and user access is limited at the moment but IMHO GME is really *really* good! Warning, I am going to be saying Google rather a lot in the next few paragraphs!

Imagine something like Tomcat with remotely hosted editable JSP/JSTL pages and a web based IDE interface and you get close to what GME is about. Everything is hosted remotely and the web based IDE editor is surprisingly versatile all considered. You can even host your code projects on Google Code which is Google's version of Sourceforge and this includes a Subversion repository. Your GME applications can be deployed as standalone web pages or as Google Gadgets for iGoogle and Google Desktop (sort of portlets for Google's personalised portals).

The available GME tag library is squarely aimed at creating Google flavoured mashups with Google's existing offerings (Google Maps, Google Calendar, Google Search, Google Base etc.).

For more generic applications development GME provides support for importing blogs/RSS/XML data (all feeds are converted into Atom format). You can access your data using XPath. You can protect your applications behind Google Authentication. You can also embed any JavaScript and CSS that you see fit. It really is an extremely powerful product.

Since it is hosted on Google's own servers, there is no need to apply for messy API keys, everything you need is already there and ready to go.

So you want an application that accesses remote feeds and creates Google Maps but you don't have your own application server? Now you can host it on GME!!!

Behind the scenes Google Mashups is hosted on a server identifying itself as "Google Fronted" and the GML pages are compiled (analogous to JSP) using the Google Web Toolkit. I don't know a great deal about the Google Web Toolkit but essentially it lets you write applications using Java and compile these into JavaScript applications. So it appears that like a JSP page is compiled into a servlet, GML pages (Oi Google! The TLA GML is already spoken for) are compiled into JavaScript. There is obviously more going in the background of this process than what is currently provided by the public version of the Google Web Toolkit. Somewhere inside there must be an application server of some sort.

In one CSS file there is a tantalising mention of something called the "Katamari Framework". I have no found an explanation for what this is yet (I have just posted a question to the GME forum asking about it).

It looks like Google's tag library could be made to work on Tomcat?It looks like there is a GWT powered application server under the hood.

Are these things and other bits and pieces going to emerge as open source into the public domain at some point? We shall have to wait and see.

I have been experimenting with GME, Google Maps API and GeoURL again! I may share the results of my experiments in a later blog entry.

Wednesday, June 20, 2007

Accessing Spring beans from Quartz jobs

The Spring Framework integrates with the Quartz scheduler in a way that makes Quartz much easier to use. Although in order to use Spring beans with your Quartz jobs you have to deviate slightly from the usual Spring "dependency injection" way of doing things. According to the Spring API this is necessary because Quartz itself is responsible for the lifecycle of its Jobs.

I was recently refactoring my use of Quartz and Spring in my feed aggregator web application. Rather than explain the internal workings of my application at this time, I will explain some features I discovered with reference to James Goodwill's recent simple example of using Quartz and Spring together. James shows how a "cron style" job can easily be created by configuring Quartz Job, trigger, SchedulerFactoryBean and loading up the application context. In James' example the Spring application context would look something like this:


<beans>
<!-- Define the Job Bean that will be executed. Our bean is named in the jobClass property. -->
<bean name="myJob" class="org.springframework.scheduling.quartz.JobDetailBean">
<property name="jobClass" value="com.gsoftware.common.util.MyJob"/>
</bean>

<!-- Associate the Job Bean with a Trigger. Triggers define when a job is executed. -->
<bean id="simpleTrigger" class="org.springframework.scheduling.quartz.SimpleTriggerBean">
<!-- see the example of method invoking job above -->
<property name="jobDetail" ref="myJob"/>
<property name="startDelay" value="2000"/>
<property name="repeatInterval" value="10000"/>
</bean>

<!-- A list of Triggers to be scheduled and executed by Quartz -->
<bean class="org.springframework.scheduling.quartz.SchedulerFactoryBean">
<propertyy name="triggers">
<list>
<ref bean="simpleTrigger"/>
</list>
</property>
</bean>
</beans>

Great stuff! You can pass static data into the Quartz job via the JobDetailBean using the JobDataMap mechanism but AFAICT you should not pass Spring beans through this means.

So what if I want my job to be able to access other Spring resources like data access layers etc.? Let us assume I have a data access object layer configured elsewhere in my Spring config (like the example below) and I want my Quartz job to be able to access it.


<!-- A DAO bean which itself may have dependencies on data sources and other stuff -->
<bean name="daoAccess" class="com.someplace.daoImpl">
<property name="dataSource">
...yadda..yadda..yadda...
</property>
</bean>

I discovered via the Quartz Method Invocation on Beans post on the Spring forum that you can pass a reference to the Spring application context via the SchedulerFactoryBean. Like the example shown below:


<bean class="org.springframework.scheduling.quartz.SchedulerFactoryBean">
<propertyy name="triggers">
<list>
<ref bean="simpleTrigger"/>
</list>
</property>
<property name="applicationContextSchedulerContextKey">
<value>applicationContext</value>
</property>
</bean>

You can then access the Spring application context inside the Quartz job. This means you can then access any identifiable beans, like the data access object layer bean, from the application context.


public class MyJob implements Job {

private static final String APPLICATION_CONTEXT_KEY = "applicationContext";

public void execute(JobExecutionContext context) throws JobExecutionException {
ApplicationContext appCtx = getApplicationContext(context);
MyDAO dao = (MyDAO) appCtx.getBean("daoAccess");
// place rest of your Job code here
System.out.println("EXECUTING QUARTZ JOB");
}

private ApplicationContext getApplicationContext(JobExecutionContext context )
throws Exception {
ApplicationContext appCtx = null;
appCtx = (ApplicationContext)context.getScheduler().getContext().get(APPLICATION_CONTEXT_KEY);
if (appCtx == null) {
throw new JobExecutionException(
"No application context available in scheduler context for key \"" + APPLICATION_CONTEXT_KEY + "\"");
}
return appCtx;
}
}

I mentioned before that I am using Spring and Quartz inside a web application. In this case I am loading the Spring application context via Spring's ContextLoaderListener in the web.xml. Using this particular method of Spring instantiation means that the Spring application context loaded is actually a WebApplicationContext with access to the ServletContext. In my web application it is very useful to be able to check the status of my job via a variable stored in the ServletContext. Armed with the above technique it is now quite easy to access the WebApplicationContext and therefore the underlying ServletContext.


public class MyJob implements Job {

private static final String APPLICATION_CONTEXT_KEY = "applicationContext";

public void execute(JobExecutionContext context) throws JobExecutionException {
ApplicationContext appCtx = getApplicationContext(context);

WebApplicationContext webCtx = null;
ServletContext srvCtx = null;
if (appCtx instanceof WebApplicationContext){
webCtx = (WebApplicationContext) appCtx;
srvCtx = webCtx.getServletContext();
srvCtx.setAttribute("foo", "bar");
}
// place rest of your Job code here
System.out.println("EXECUTING QUARTZ JOB");
}

private ApplicationContext getApplicationContext(JobExecutionContext context )
throws Exception {
... shown previously ...
}

}

I hope these features are useful to people. I sometimes worry that Spring hides it's beauty under a bushel a little too much but I suppose the problem is Spring provides such an embarrassment of riches it is impossible to highlight everything useful.

Incidentally, in this post I have been experimenting with Google's prettify.js syntax highlighter. Looks good to me, cheers Google!

Thursday, May 31, 2007

Google Gears: Cross browser client-side persistent storage

I just heard that Google have just released a beta version of something called Google Gears along with accompanying API, blog and developers forum. Essentially, it supplies a Firefox and IE browser plugin that can enable "Gears Aware" websites (after they have been explicitly given permission) to write data to and from local client-side storage. It appears that you will even be able to host files on a LocalServer file cache which means that it should be possible to create offline JavaScript powered web applications.

Google Gears seems to be yet another example of the trend towards moving more web application processing power onto the client.

To my mind Google Gears looks like it will compete directly with similar novel client-side web technology approaches such as LAJAX and POW. Also, I do not think it be out of place to consider this in the context of other client-side uses of web technologies such as Google Desktop, Yahoo!'s Widgets (formerly Konfabulator) and Apple's Dashboard.

Yet another very exciting Google offering! Google Gears reportedly already works in concert with Google Reader. Is this the future of the web? Although I am very excited by the technology and the potential of it, it is slightly worrying that Google have a mind to bolt such significant cross-browser technological improvements onto the client when the browser technology is not developing fast enough for their needs (please don't hurt the web, use open standards!!!). We all know Google is special but what if everybody started doing this? Granted Microsoft have been doing this kind of thing for years (I'm thinking specifically of VirtualEarth at the moment) BUT it is the kind of thing that has served to make Microsoft unpopular.

Friday, April 27, 2007

Never Mind the Namespaces: An E4X client

Bob DuCharme published "Never Mind the Namespaces: An XSLT RSS Client" way back in January 2003. It is one of my favourite articles and I have regularly returned to it over the years.

Essentially it demonstrates how the XPath local-name() function can be used to create a single universal XSLT stylesheet that would work with the many RSS formats that share a similar construction but different namespace. As far as the typical XML processor is concerned a difference in namespace is usually an extremely significant difference and using local-name() function essentially gives you the power to write XSLT thats says "I do not care what namespace the 'item' element is in just give it to me".

I have started using E4X in real projects recently and it also supports functionality equivalent to the XPath local-name() function. Ignoring how data is loaded into the DOM, the following are pretty much equivalents.


XSL

*[local-name()='item']/*[local-name()='link']

E4X

doc.*::item.*::link

This ability means we can write an E4X script that can also render RSS/Atom paying little heed to namespaces.

The following example is probably not perfect but in theoretically it could load any RSS/Atom and render it. Additional action would be necessary to bypass XMLHttpRequest security constraints for accessing remote feeds.

Assuming you have a decent browser that supports E4X the following should work (e.g. Firefox 2).

http://content.mark-mclaren.info/e4x_namespaces.html
http://content.mark-mclaren.info/e4x_namespaces.js

Bookmarklets and Firefox Toolbars

Bookmarks and Newsfeeds App Bookmarklets

We recently had usability tests conducted on our portal by an outside usability expert. In common with many portals we offer personalised bookmarks, newsfeeds and contacts functionality. I wrote the bookmarks and newsfeeds portlet applications from scratch so I am intimately familiar with them and pretty much free to tinker with them as I see fit. I have previously open sourced the bookmarks portlet (although I need to update it soon to reflect recent improvements with regard to usability and UTF-8 compatibility).

One of the outcomes of the usability tests was that users wanted a way to interact with the bookmarks and newsfeeds applications without having to be directly accessing them via the portal. This makes sense, they want to be able to hit "add bookmark" and not have to type in the URL and title.

I had seen bookmarklets that offered similar bookmarking facilities for del.ico.us and thought I could try to produce something similar.

Our portlets are all written using Struts Bridge which means they are quite happy to run as standalone applications in addition to being portlet applications. I extracted the part of the application that was responsible for adding bookmarks/newsfeeds and deployed this as a new standalone application. I thought it necessary to do this because I want slightly different behaviour when used in the bookmarklet context. On the successful submission of a new bookmark or newsfeed I want the application to redirect the user back to where they were so that they can continue browsing.

I started by developing a simple bookmarklet for the portlets bookmarks application (this was really easy!). For the newsfeed bookmarklet I needed slightly more advanced behaviour. So I derived a solution from Martin Dittus' excellent feed links bookmarklet.

I was very pleased with the new bookmarklets. In the newsfeeds case I could even claim to be performing autodiscovery! There is also the possibility of tweaking our Content Directory to supply contact details in the hCard microformat so that I could derive "add contacts" functionality from something like Left Logic's sexy microformats bookmarklet.

The bookmarklets have the added benefit of being cross browser solutions.

Bookmarks and Newsfeeds Firefox Toolbar

Although I was pleased with the bookmarklets, I felt that installing multiple bookmarklets can be a little cumbersome for the user and I found it slightly annoying that the bookmarklets did not have proper icons.

So much as del.icio.us started out with bookmarklets and evolved into a Firefox toolbar I thought I would try to follow suite. This lead me down the path of learning how to create Firefox Toolbars.

I have so far found Firefox Toolbar development to be rather a joy with much of the technology involved already very familiar to me (XML, JavaScript, E4X, CSS, Zip files). Granted I have not really toyed with XUL and XPCOM before but I have found it a very enjoyable and rewarding experience (instant gratification is great isn't it?). An additional benefit is I can reuse any XPCOM learnt when I start writing Server Side JavaScript (SSJS) for the Plain Old Webserver (POW).

My first modest toolbar effort contained a couple of static links and slightly modified versions of the "add bookmark" and "add newsfeed" bookmarklets.

As I was having fun with the technology now, I could not resist extending my initial toolbar to incorporate a newsfeed ticker, this was fairly easy to do and an opportunity to make use of E4X in a real life context.

CASBar - A CAS Toolbar for Firefox 2

The glue behind our portal and associated web applications is JA-SIG Central Authentication Service (CAS) for single sign on. In order for bookmarks or newsfeeds to be interacted with from the toolbar the user first needs to login to CAS first.

This led me to create a toolbar specifically for use with CAS. Scott Battaglia (Project Lead and Lead Architect on CAS) was sufficiently impressed ("That looks awesome!") with this modest toolbar that it has now become an official sub-project called CASBar. So now I am a CAS sub-project lead, I am quite chuffed, as I really didn't see that one coming!!!


Wednesday, February 28, 2007

Unmarshalling KML with XStream and Polycythemic Data Models

As I mentioned before, I am usually only interested in about 20 percent of what a KML file can contain, just the 2D geometry and nothing else. I want to extract this geometry collection and store it in Java objects for further processing. If I was interested in capturing all the XML information into objects I would normally turn to some kind of XML binding library like Castor, JAXB or XML Beans. I do not want to capture the entire document in Java objects so what are my options?

I could pre-process the XML document with an XSL and create an abridged version of the original XML schema containing only the elements I am interested in. Then I could use XML binding on this abridged XML schema. Although this is possible, it feels a little too much like hard work, performance may suffer (unless I used a streaming transformation engine like STX rather than XSL) and generally it is not a very flexible approach. Therefore, I will not be doing that.

Then I discovered XStream. XStream is often mentioned in the XML binding context but it is subtly different to Castor, JAXB and XML Beans. XStream has a streaming nature (much like SAX), it parses XML and can map whatever it finds into Java classes (and vice versa). Essentially, you need to do a little groundwork in order to establish how your XML data maps to Java objects. XStream makes cunning use of reflection so that the amount of groundwork you need to do is actually relatively slight. XStream does not require that your Java object implement any specific methods, class variables are all it requires to populate objects. This appears a little strange at first and it can effect how you write your object methods. For example, you cannot assume that when XStream was populating your Java objects that any class constructor was ever called. This seems a little odd as you end up with a pre-populated object without all the usual procedures followed but at the same time it is quite ingenious.

I have created a simple proof of concept that converts KML into a Java object representation of Kml (containing only the 2D geometry elements); you can download the Java source code here, you will also need the XStream library.

In the main program XStream is used to read a KML sample file into simple, hand crafted, Java objects. XStream can then very easily return this tree of objects back into KML (of sorts). To prove to the more cynical (like myself) that this actually works I have also included a simple traversal of the newly created Kml object tree. That was scarily simple wasn't it?

Elsewhere, I have attempted to further extend this example code making use of the Geotools and the JTS project code (and possibly also WKB4J). I am attempting to create FeatureCollections upon which I can then use a bounding box filter and similar things.

My problem is that the Geotools/JTS code seems to be very focussed on conducting GML based XML parsing itself and binding these to its own Java objects tree. XStream has already produced a Java object tree for me, so I need to be able to skip to the good bits. Alas, so far, my approach of using XStream in order to convert KML into JTS or GML is getting a little too complicated for me, even if it is fun to play with. I have managed to create a hideous fusion hybrid of KML and GML, if this turns out be quite useful I will follow in the masters footsteps and not release the XML schema anytime soon.;)

I am about to go off on one for the next paragraph so consider yourself warned (this is not critism, just the ramblings of a madman, I send nothing but positive vibes :) ). As an enthusiastic Java programmer and amateur follower of GIS, I really, really want to like Geotools, JTS and find them a joy to work with. However, sometimes I find it all very complicated. There is great work in there but IMHO it is difficult to get at. When working with Java in the GIS arena there is a real danger of creating what I am calling Polycythemic data models. Polycythaemia is the exact opposite of Anemia (hence my use!). Martin Fowler warns of the dangers of creating Anemic data models, these do very little except act as containers for data. Inversely, Polycythmic models try to do far too much! E.g. the idea of "Spatial DB in Box" seemed quite good to me until I sat down and thought about what it was suggesting. What makes spatially aware databases like PostGIS good is that they are highly optimised. The approach of a Spatial DB in Box sounds like a good idea because it would turn a non-spatially aware database into one that was spatially aware, it hides the innate complexity of the GIS aware layer but at what cost to performance??? You have to be very careful about how you layer complexity upon complexity. Although it is very tempting to start experimenting with storing WKB objects in Derby or H2 blobs, I suspect that way madness lies...

Friday, February 23, 2007

Tomcat, sprechen Sie UTF-8?: Finally, my Portlet talks UTF-8

I finally got my Tomcat hosted portlet to communicate in UTF-8 throughout. I have had a pretty productive day and I am fairly light-headed and pleased with the relative ease and elegance of the method that finally succeeded, so please forgive me if I start rambling incoherently (for those only interested in the Java stuff, I'll try to confine my inane ramblings to sections denoted with italics).

Incidentally, I did consider calling this entry "Herr Tomcat sprechen sie UTF-8?" but I thought the gender police would come get me. I once read that the reason that men are drawn towards creative pursuits like programming is because they cannot physically give birth. I'm sure ladies who drone on about the pain of childbirth haven't experienced the pain of getting a web application using extensive CSS working across all browsers, okay, perhaps a tad glib, I digress...

Here is a screenshot of my working newsfeeds portlet, showing UTF-8 and all:

screenshot of my newsfeed app

So I have been writing a newsfeeds portlet based on my JSR168 bookmarks portlet and reusing a lot of the same code. Essentially, thanks to my decision to use Struts Bridge and Spring Framework together I think that my bookmarks portlet is pretty well architected. Converting the bookmarks portlet to handle newsfeeds is basically a case of adding a little ROME and AJAX magic into the equation.

The other day I was moaning about Unicode to my long-suffering partner (who does not work with computers)

Me: Unicode...terrible...complicated...grumble, grumble, grumble
Partner: That sounds nice
Me: What sounds nice?
Partner: Working with Unicorns

Er, yeah, well anyway...

Part 1: Database eat my UTF-8

Beating Oracle Database into submission, you will store my UTF-8 damn you.

The first part of the saga was to get my Oracle database to store UTF-8. But Oracle supports UTF-8 you say, well that is true but this is real life, I do not look after the database, I am far from being the only user and it is enterprise wide, has been in service for years and the format the database currently uses is 7bit US-ASCII. This is due to change in the near future but for now at least I have to deal with it. After looking long and hard at various methods to round trip between UTF-8 and 7bit US-ASCII, none of which I really understood, I found a solution I liked. This is supposed to be only a temporary fix until the Oracle databases are upgraded to UTF-8.

My solution to getting my UTF-8 in and out of an US-ASCII format database is to use base64 encoding. My XBEL XML string, which I store inside CLOBS in both bookmarks and newsfeeds portlets, is readily encoded using base64 (using a utility class from commons codec). I have not noticed any additional performance problems introduced by converting to and from base64 format yet, if anything it almost seems a little faster! Plus since I know I am only dealing with well formed XML and base64 encoded strings, I can check to see if the first character of the string is the "<" character (and the base64 alphabet does not contain this character), this identifies my string as XML format. Using this tell tale XML signal, I can introduce the base64 encoding/decoding inside my DAO implementation layer and it will continue to also work with existing XML format stored data. My DAO can now easily deal with both XML and base64 as appropriate. Also, when the database is upgraded to use UTF-8, I can use the same clue to gradually convert all the encoded strings back to raw XML string format.

Part one, success, a means to store UTF-8 in a non-UTF-8 compatible database and it will work with existing XML data and the method will be reversible in the future.

Part 2: Getting my application to display and receive UTF-8 correctly.

As I mentioned, this portlet is essentially a Struts application, including numerous JSP view pages. To ensure as much as possible that my Struts application outputs UTF-8 format I made several changes. Maybe not all of these changes are absolutely necessary but since I did not know what was stopping it working, I changed everything UTF-8 related that I could.

In struts-config.xml for my Struts bridge portlet I changed the controller entry to look like this:



<controller pagePattern="$M$P" inputForward="false" processorClass="TPstring">"org.apache.portals.bridges.struts.PortletRequestProcessor" contentType="text/html;charset=UTF8"/>

In web.xml I added additional init-param on servlets for my Struts bridge portlet:



<servlet>
<servlet-name>action</servlet-name>
<servlet-class>org.apache.portals.bridges.struts.PortletServlet</servlet-class>
<init-param>
<param-name>config</param-name>
<param-value>/WEB-INF/struts-config.xml</param-value>
</init-param>
<init-param>
<param-name>content</param-name>
<param-value>text/html;charset=UTF8</param-value>
</init-param>
</servlet>

in the XML and XHTML producing JSPs



<%@ page contentType="text/html; charset=utf-8" pageEncoding="UTF-8""TPoperator">%><%--
--%
><?xml version="1.0" encoding="UTF-8"?>

in my struts tag produced forms I added



<html:form action="/Action.do" acceptCharset="UTF-8""TPkeyword1">>

I read somewhere that I could apply page encodings in JSP2.0 applications using <jsp-property-group> in web.xml but this seemed to suggest that this would only work with JSP pages and in my Struts application my JSPs mostly live under WEB-INF and therefore do not have any directly addressable URL so I didn't think this technique could be applied.

At this point, having liberally sprinkled UTF-8 references throughout my application, the feeds fetched using AJAX that contained UTF-8 characters displayed correctly (hurray!) BUT this was not the case for the feed titles, the title and url of the feed was submitted by the user via a form. Something was messing up the format between the browser and the server.

I started looking round a bit more and found that several people were saying that a UTF-8 filter seemed to be of help, however, this was unlikely to be of help in my particular situation. This is a JSR168 portlet and portlets, without considerable effort, pay no heed to servlet filters.

In several of the pages I visited, it was suggested that Tomcat 5.5.X was responsible for the poor way in which UTF-8 character encoding was being handled. I could believe that since, I had gone to great lengths in my attempts to ensure everything produced UTF-8, and my application was half working (at least the parts that had not been submitted via forms). A workaround for Tomcat's deficiency was to add:



if(request.getCharacterEncoding() == null){
request.setCharacterEncoding("UTF-8");
}

before attempting to retrieve any request parameters. I tried this out and it seemed to work but for a moment I thought I would need to add this code to all my Struts actions (or at least extend the Struts action class to perform this). Then I remembered something I was reading the other day. I have started using ServletContextListeners quite a bit recently, these are called when an application first starts up, the Spring Framework uses them to establish contexts and they are very handy for initialising databases and such. Well, there are other Listeners available besides ServletContextListeners and the one I remembered reading about was a ServletRequestListener. A ServletRequestListener is called every time a request is created and destroyed. Therefore, I could place the above code inside a ServletRequestListener and it would solve my UTF-8 problems. There was an added bonus in using a ServletRequestListener, where a filter would not work in a portlet a ServletRequestListener does!



package somewhere.web;

import java.io.UnsupportedEncodingException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletRequestEvent;
import javax.servlet.ServletRequestListener;

public class UTF8EncodingRequestListener implements ServletRequestListener{

public void requestDestroyed(ServletRequestEvent servletRequestEvent) {
}

public void requestInitialized(ServletRequestEvent servletRequestEvent) {
ServletRequest request = servletRequestEvent.getServletRequest();
String enc = request.getCharacterEncoding();
if (enc == null){
try {
request.setCharacterEncoding("UTF-8");
} catch (UnsupportedEncodingException ex) {
ex.printStackTrace();
}
}
}

}

One last thing, to test UTF-8 I got some Arabic, Russian and Chinese RSS newsfeeds from the BBC OPML feeds list (see above screenshot). All was working fine on my Win2K workstation but when I tried this on Windows XP my Chinese characters did not work. I tried to access Google China and again the characters did not work. It turns out that East Asian fonts are not installed by default with Windows XP and if you want them you will need to install them yourself (If you have a spare 230 megabytes on your hard disk and care about such things).

References I found useful

Wednesday, February 21, 2007

Extracting coordinates from KML with XSL (e.g. for Google Maps)

Now that the Google Earth KML 2.1 format has an XML Schema, we can use XML validators to say for definite if a given XML file follows the rules of the KML format (i.e. it validates). An XML Schema gives an authoritative description of an XML format. XML Schema is not an easy format for humans to read. Worse still, KML 2.1 is a very complicated format and due to this, the XML Schema for KML 2.1 is also complex.

Many people who find my blog are looking to release the data that they have trapped inside KML files. The main problem they face is getting the "coordinates" data out of the KML. Mostly, people want to do this because they want to create Google Maps (GMaps) with that data. There are other reasons that we might want to extract this data and what I am about to present is a generic approach to extract this data (even if you do not plan to use it to produce Google Maps).

Firstly, thanks to the XML Schema, I can now examine the structure of the KML elements with more confidence. I spent a little time extracting what I considered to be the important parts of the KML 2.1 file format. There can be a great deal of information inside a KML file but I discovered that I am only interested in about 20% of what a KML file can currently offer. I am intentionally ignoring any of the format that I consider to be specifically useful to Google Earth (e.g. NetworkLinks, Overlays, style information, schema extension mechanisms, 3d models and anything involving altitude). This left me with this:

<kml> - top level root of the XML document
<kml> can contain any number of Feature elements.

Feature elements are <Document>, <Folder>, <Placemark>.
Feature elements can contain <name>, <address>, <description> and other sub-elements.

Geometry elements are <MultiGeometry>, <Point>, <LineString>, <LinearRing>, <Polygon>.

Feature elements

<Document> can contain any number of Feature elements.
<Folder> can contain any number of Feature elements.
<Placemark> can contain any number of Geometry elements.

Geometry elements

<MultiGeometry> can contain any number of Geometry elements.
<Point> contains a single <coordinates> element.
<LineString> contains a single <coordinates> element.
<LinearRing> contains a single <coordinates> element
<Polygon> contains a maximum of one <outerBoundaryIs> element.
<Polygon> contains any number of <innerBoundaryIs> elements

<outerBoundaryIs> elements contain a single <LinearRing> element
<innerBoundaryIs> elements contain a single <LinearRing> element

<coordinates> elements contain a space separated Cartesian coordinate value triples (e.g. "x1,y1,z1 x2,y2,z2"), if the element is contained inside a <Point>, this string is most likely to contain a single coordinate triple

Using this abridged description of KML, I can construct an XSLT that can extract co-ordinate information from any KML 2.1 format file (providing that the KML file does not extend the KML schema).


<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:kml="http://earth.google.com/kml/2.1">
<xsl:output method="text" version="1.0" omit-xml-declaration="yes" />

<xsl:variable name="cr"><xsl:text>
</xsl:text></xsl:variable>

<xsl:template match="/ | @* | * | comment() | processing-instruction() | text()">
<xsl:apply-templates select="@* | * | comment() | processing-instruction() | text()" />
</xsl:template>

<xsl:template match="kml:Document | kml:Folder | kml:Placemark">
<xsl:value-of select="name()"/>
<xsl:if test="string-length(kml:name) > 0">
Name: <xsl:value-of select="kml:name"/>
</xsl:if>
<xsl:if test="string-length(kml:address) > 0">
Address: <xsl:value-of select="kml:address"/>
</xsl:if>
<xsl:if test="string-length(kml:description) > 0">
Description: <xsl:value-of select="kml:description"/>
</xsl:if>
<xsl:value-of select="$cr"/>
<xsl:apply-templates />
</xsl:template>

<xsl:template match="kml:MultiGeometry | kml:LineString | kml:Point | kml:LinearRing | kml:Polygon">
<xsl:value-of select="name()"/><xsl:value-of select="$cr"/>
<xsl:apply-templates />
</xsl:template>

<xsl:template match="kml:outerBoundaryIs | kml:innerBoundaryIs">
<xsl:value-of select="name()"/><xsl:value-of select="$cr"/>
<xsl:apply-templates />
</xsl:template>


<xsl:template match="kml:coordinates">
<xsl:call-template name="split">
<xsl:with-param name="str" select="normalize-space(.)" />
</xsl:call-template>
</xsl:template>

<xsl:template name="split">
<xsl:param name="str" />
<xsl:choose>
<xsl:when test="contains($str,' ')">
<xsl:variable name="coord"><xsl:value-of select="substring-before($str,' ')" /></xsl:variable>
<xsl:variable name="first"><xsl:value-of select="substring-before($coord,',')" /></xsl:variable>
<xsl:variable name="second"><xsl:value-of select="substring-before(substring-after($coord,','),',')" /></xsl:variable>
X: <xsl:value-of select="$first" />
Y: <xsl:value-of select="$second" /><xsl:value-of select="$cr"/>
<xsl:call-template name="split">
<xsl:with-param name="str" select="normalize-space(substring-after($str,' '))" />
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:if test="string-length($str) > 0">
<xsl:variable name="first"><xsl:value-of select="substring-before($str,',')" /></xsl:variable>
<xsl:variable name="second"><xsl:value-of select="substring-before(substring-after($str,','),',')" /></xsl:variable>
X: <xsl:value-of select="$first" />
Y: <xsl:value-of select="$second" /><xsl:value-of select="$cr"/>
</xsl:if>
</xsl:otherwise>
</xsl:choose>
</xsl:template>

</xsl:stylesheet>

With a little XSL know-how, the above skeletal XSLT can be modified to insert Google Maps JavaScript as appropriate.

There are approximate parallel between KML and GMaps e.g.:

I would stop short of creating a universal KML to GMaps solution because I have found specific requirements vary greatly. Larger quantities of co-ordinate data need special handling. The presentation of data, colours and style should be the decision of the developer. Although I have not produced a technique that people without XML, XSLT and JavaScript knowledge can use, I hope this is still useful to somebody.