Wednesday, February 28, 2007

Unmarshalling KML with XStream and Polycythemic Data Models

As I mentioned before, I am usually only interested in about 20 percent of what a KML file can contain, just the 2D geometry and nothing else. I want to extract this geometry collection and store it in Java objects for further processing. If I was interested in capturing all the XML information into objects I would normally turn to some kind of XML binding library like Castor, JAXB or XML Beans. I do not want to capture the entire document in Java objects so what are my options?

I could pre-process the XML document with an XSL and create an abridged version of the original XML schema containing only the elements I am interested in. Then I could use XML binding on this abridged XML schema. Although this is possible, it feels a little too much like hard work, performance may suffer (unless I used a streaming transformation engine like STX rather than XSL) and generally it is not a very flexible approach. Therefore, I will not be doing that.

Then I discovered XStream. XStream is often mentioned in the XML binding context but it is subtly different to Castor, JAXB and XML Beans. XStream has a streaming nature (much like SAX), it parses XML and can map whatever it finds into Java classes (and vice versa). Essentially, you need to do a little groundwork in order to establish how your XML data maps to Java objects. XStream makes cunning use of reflection so that the amount of groundwork you need to do is actually relatively slight. XStream does not require that your Java object implement any specific methods, class variables are all it requires to populate objects. This appears a little strange at first and it can effect how you write your object methods. For example, you cannot assume that when XStream was populating your Java objects that any class constructor was ever called. This seems a little odd as you end up with a pre-populated object without all the usual procedures followed but at the same time it is quite ingenious.

I have created a simple proof of concept that converts KML into a Java object representation of Kml (containing only the 2D geometry elements); you can download the Java source code here, you will also need the XStream library.

In the main program XStream is used to read a KML sample file into simple, hand crafted, Java objects. XStream can then very easily return this tree of objects back into KML (of sorts). To prove to the more cynical (like myself) that this actually works I have also included a simple traversal of the newly created Kml object tree. That was scarily simple wasn't it?

Elsewhere, I have attempted to further extend this example code making use of the Geotools and the JTS project code (and possibly also WKB4J). I am attempting to create FeatureCollections upon which I can then use a bounding box filter and similar things.

My problem is that the Geotools/JTS code seems to be very focussed on conducting GML based XML parsing itself and binding these to its own Java objects tree. XStream has already produced a Java object tree for me, so I need to be able to skip to the good bits. Alas, so far, my approach of using XStream in order to convert KML into JTS or GML is getting a little too complicated for me, even if it is fun to play with. I have managed to create a hideous fusion hybrid of KML and GML, if this turns out be quite useful I will follow in the masters footsteps and not release the XML schema anytime soon.;)

I am about to go off on one for the next paragraph so consider yourself warned (this is not critism, just the ramblings of a madman, I send nothing but positive vibes :) ). As an enthusiastic Java programmer and amateur follower of GIS, I really, really want to like Geotools, JTS and find them a joy to work with. However, sometimes I find it all very complicated. There is great work in there but IMHO it is difficult to get at. When working with Java in the GIS arena there is a real danger of creating what I am calling Polycythemic data models. Polycythaemia is the exact opposite of Anemia (hence my use!). Martin Fowler warns of the dangers of creating Anemic data models, these do very little except act as containers for data. Inversely, Polycythmic models try to do far too much! E.g. the idea of "Spatial DB in Box" seemed quite good to me until I sat down and thought about what it was suggesting. What makes spatially aware databases like PostGIS good is that they are highly optimised. The approach of a Spatial DB in Box sounds like a good idea because it would turn a non-spatially aware database into one that was spatially aware, it hides the innate complexity of the GIS aware layer but at what cost to performance??? You have to be very careful about how you layer complexity upon complexity. Although it is very tempting to start experimenting with storing WKB objects in Derby or H2 blobs, I suspect that way madness lies...


ismjml said...


I have downloaded and played with your Java code and made some additions. I'd like to make some more, but I am not sure how you feel about this. Let me know if you're OK with it and we can work out how to keep it open and available. Thanks.
Note: Comment imported. Original by Anonymous at 2007-03-19 01:54

ismjml said...

It is okay with me; in fact I am delighted that you have found this useful.

Unless I specify otherwise, everything on my blog is covered by the Creative Commons Attribution 2.0 UK license. This means you are free to copy, distribute, display, perform the work and to make derivative works providing you give me some sort of credit.

Note: Comment imported. Original by markmc website: at 2007-03-19 08:45

ismjml said...

thanks for the blog
Note: Comment imported. Original by ideaTaxi website: at 2007-05-21 23:26

ismjml said...

can you share what changes have you made. would love to know. I made some changes to fit the kml model I have.
Note: Comment imported. Original by Anonymous at 2007-11-27 06:48