Thursday, June 30, 2005

Wiki migration, XML-RPC and the Jakarta IO Taglib

Some time ago I chose to install a Wiki for the development team in which I work. The engine I chose at that time was UseModWiki. UseModWiki is a monolithic Perl script and as such was very easy to install and configure. One of the other reasons I chose it was because it was being used by the Apache Software Foundation. Since that time Apachehave migrated from UseModWiki to MoinMoin. Perl isn't really my first language anymore and I'd like to add a couple of bells and whistles so I'd be happier to use something Java technology based and there are now some quite mature Java based wiki engines about.

My problem, I expect, is a relatively common one. I have quite a bit of content in the UseModWiki instance that I'd like to migrate to my brand spanking new wiki engine (JSPWiki). Now UseModWiki uses db files for page storage, I'm not sure I remember enough Perl to write a database extraction utility for this purpose. So what do I do?

It turns out that somebody has written a companion Perl script for UseModWiki to enable it to export its content using XML-RPC.

0xDECAFBAD's XmlRpcToWiki

Cool but doesn't using XML-RPC in Java sound a little too much like hard work? Well that is what I thought at first too until I remembered the Jakarta IO Taglib.

This Jakarta IO taglib supports XML-RPC so with a little help from JSTL, and a Base64 decoding taglib I found on the net, I am able to query and write the UseModWiki content to text files. Granted this is probably a couple of the ugliest JSPs I have ever written (and I've written some doozies in my time). This kind of thing really belongs in a command line Perl script butsince the taglibs are making things so easy and it is a one-off, then it won't hurt to do this just this once!


<%@ taglib uri="http://jakarta.apache.org/taglibs/io-1.0" prefix="io" %>
<%@ taglib uri="http://www.servletsuite.com/servlets/base64tag" prefix="base64" %>
<%@ taglib uri="http://java.sun.com/jstl/core" prefix="c" %>
<%@ taglib uri="http://java.sun.com/jstl/xml" prefix="x" %>

<c:set var="textResponse"><%--
--%>
<io:xmlrpc url="http://someserver/cgi-bin/usemod_xmlrpc.pl"><%--
--%>
<io:body><%--
--%>
<methodCall><%--
--%>
<methodName>wiki.getAllPages</methodName><%--
--%>
</methodCall><%--
--%>
</io:body><%--
--%>
</io:xmlrpc><%--
--%>
</c:set>

<c:out value="${textResponse}"/>

<x:parse xml="${textResponse}" var="responseXml"/>

<br />

<ul>
<x:forEach select="$responseXml/methodResponse/params/param/value/array/data/value/string" var="pageName">
<li><x:out select="."/></li>

<c:import url="script2.jsp" var="dummy">
<c:param name="pageName"><x:out select="."/></c:param>
</c:import>
</x:forEach>
</ul>

The JSP page above queries the UseModWiki server using XML-RPC. It receives a list of all the pages on the Wiki (in an XML message format), it then processes this list calling the page below once for every page in the wiki.


<%@ taglib uri="http://jakarta.apache.org/taglibs/io-1.0" prefix="io" %>
<%@ taglib uri="http://www.servletsuite.com/servlets/base64tag" prefix="base64" %>
<%@ taglib uri="http://java.sun.com/jstl/core" prefix="c" %>
<%@ taglib uri="http://java.sun.com/jstl/xml" prefix="x" %>
<%@ page import="java.io.*" %>

<c:set var="pageName"><c:out value="${param.pageName}" default="Rubbish" /></c:set>

<c:set var="textResponse"><%--
--%>
<io:xmlrpc url="http://someserver/cgi-bin/usemod_xmlrpc.pl"><%--
--%>
<io:body><%--
--%>
<methodCall><%--
--%>
<methodName>wiki.getPage</methodName><%--
--%>
<params><%--
--%>
<param><%--
--%>
<value><c:out value="${pageName}"/></value><%--
--%>
</param><%--
--%>
</params><%--
--%>
</methodCall><%--
--%>
</io:body><%--
--%>
</io:xmlrpc><%--
--%>
</c:set>

<h1><%= pageContext.getAttribute("pageName") %></h1>

<%
File file = new File( "c://jspwiki//" + pageContext.getAttribute("pageName") + ".txt" );
FileWriter fileWriter = new FileWriter( file );

%>

<x:parse xml="${textResponse}" var="responseXml"/>

<c:set var="output"><%--
--%>
<base64:decode><%--
--%>
<x:out select="$responseXml/methodResponse/params/param/value/base64"/><%--
--%>
</base64:decode><%--
--%>
</c:set>


<h2>Writing....</h2>
<pre>
<c:out value="${output}" escapeXml="false"/>

</pre>
<h2>To</h2>
<h1><%= file.getCanonicalPath() %></h1>

<io:pipe writer="<%= fileWriter %>"><%--
--%>
<c:out value="${output}" escapeXml="false"/><%--
--%>
</io:pipe>

<%
fileWriter.close();
%>

The JSP page above is passed a pagename parameter and will query the UseModWiki for the page contents. It receives the contents in an XML format message and converts the string value it receives from Base64 encoding to plain text. It then writes this to a text file.

It worked, I now have all my UseModWiki files in the original text format. All I needed to do was to convert from the UseMod TextFormatting to the JSPWiki TextFormatting. I found WinGrep came in quite handy for that, although I suppose I could have written it using the Jakarta RegExp taglib, now that would have been a really ugly JSP!

Tags :

Monday, June 27, 2005

The Marvelous Conditional Get

Consuming content with caching

I am going to highlight two ways to take advantage of HTTP 1.1 caching mechanisms for content consumption. Caching can significantly improve performance and can reduce the network bandwidth load on the content provider server. I think it is only good etiquette to use caching where it is possible in your applications. Static web documents (including HTML, CSS, JavaScript, XML, XSL) are usually cached and can be retrieved as such. I found both these techniques when I was writing my newsfeed aggregator application.

Using URLConnection with a Proxy Server

The first method takes advantage of an external web proxy cache server (such as those powered by Squid). It is possible to set the environmental variable for the JVM (-Dhttp.proxyHost) so that all the content is obtained through a web proxy cache but this is not always the behaviour you require. The document describing this feature said that it is not well documented, and it still isn't, that is why I am reproducing it here.

You can read the original document at Using URLConnection with a Proxy Server.

Essentially it boils down to instead of the usual:


URL url = new URL("http://www.javablogs.com/ViewDaysBlogs.action?view=rss");
URLConnection c = url.openConnection();

You can do the following and it will use the web proxy cache as an intermediary. This will reduce the load on the destination server if the document is static (or is dynamically generated but supports caching) and if the web proxy cache server used is located closer to the application's server then this will also improve the response times.


URL url = new URL(
"http",
// protocol
"myProxy.com",
// host name or IP of proxy server to use
-1,
// proxy port or -1 to indicate the default port for the protocol
"http://www.javablogs.com/ViewDaysBlogs.action?view=rss");
// the original URL, specified in the "file" parameter

ROME sub-project: RomeFetcher

In my newsfeed aggregator I initially used the above technique to obtain syndicated feeds. I then discovered I could use RomeFetcher which is specifically designed to retrieve newsfeeds and support the HTTP 1.1 Conditional Get mechanism (ie: last modified and ETag handling). Example code can be found on the RomeFetcher site. The Jakarta Commons FeedParser looks like it will also support the conditional get mechanism but there have been no releases of this as yet.

The conditional get mechanism is comprehensively explained in HTTP Conditional Get for RSS Hackers.

Producing cacheable content

The two methods above take advantage of caching for content consumption. Most modern web servers will serve up static documents and intrinsically support the caching mechanism. The very nature of dynamically generated documents means that they do not support caching (unless explicitly written to do so). Depending on the generated content characteristics this can sometimes be remedied.

If your content is updated periodically then it is relatively easy to modify the servlet code to support caching. In a servlet it is simply a case of implementing the getLastModified method as appropriate to the content.

See e.g.: Utilizing browser and proxy server cache

Friday, June 24, 2005

XBEL and DHTML: a perfect match

XBEL

The XML Bookmark Exchange Language (XBEL) has its origins in the Python community. It is described as an internet "bookmarks" interchange format and has similarities to the OPML format. I first encountered XBEL as the core format of the uPortal bookmarks channel. I have since used it as the main storage format in numerous web applications including a bookmark manager, newsfeed aggregator and several "shopping basket" types of program. As you can probably tell from my other blog entries I like to mix Java with XML technologies which probably partly explains my infatuation with this format.

Example XBEL tree:
tree.xml

DHTML Tree

I first discovered the DHTML tree widget on a page entitled Unobtrusive DHTML, and the power of unordered lists. This appealed to me as I am required to write accessible web applications and this looked a pretty neat solution without necessitating the need for JavaScript to be enabled in the browser. It also transfers a lot of the load associated with rendering tree refreshes from the server to the client side. The above DHTML tree was extended by Matt Kruse to add expand/contract functionality. I recently re-implemented the DHTML tree inspired by the mechanism of D.D. de Kerf's Easy DHTML TreeView but using unordered lists instead of table layout. The reason I moved to D.D. de Kerf's design was to allow multiple trees to exist simultaneously on a single page that would not interfere with each other but would share the same static CSS and JavaScript files (not perfect yet).

Tags