Thursday, June 14, 2007

Java GIS Collaboration – Don’t hold your breath. (But we are getting closer.)

As far as the broader open source communities goes, Java GIS has one really bad habit. The smaller member communities and projects just aren’t very good at working together. Case in point:

You want to obtain Feature objects from an ESRI Shapefile? You know someone has written some open source code that does just that? Where do you look?

You could use the GeoTools code for Shapefile I/O.
You could use the Deegree project code for Shapefile I/O.
You could use the JUMP code for Shapefile I/O.

There are only a couple of technical challenges to overcome. All three code libraries will give you Feature objects that come from a different Feature Model, and you’ll also have to learn a unique data source/data access framework for each project.

How did things get this bad?

I don’t have an answer to that question.

Will things get better?

I hope so. In fact, I have seen a light at then end of a long dark tunnel in this regard. For example, even within the OpenJUMP community we are seeing a quiet push towards collaboration and consolidation. We have contributors from DeeJUMP developers, SkyJUMP developers, SIGLE JUMP developers and other independent developers. Even the developers from Vivid Solutions, the creator of JUMP, occassionally chime in on the mailing list. For the first time we seem to be close to realizing our goal of having OpenJUMP as a common software “platform” that serves as foundation for plug-in developers looking for a JUMP replacement with a more open development model. (I attribute a good deal of this positive trend to the hard work of Stefan Stienger, the project administrator for OpenJUMP.)

I also see the light at the end of a long, dark tunnel when I consider discussion of collaboration among other Java GIS projects. For example, Jody Garnett and Cory Horner have been encouraging greater collaboration among the different Java GIS projects. Cory Horner has even set up a page at the OSGeo to facilitate the discussion of collaboration in this area:

(I think the OSGeo is the perfect “umbrella” for this type of discussion and the concrete interfaces and implementations that will hopefully result.)

Jody Garnett has also been instrumental in helping guide me around some of the GeoTools maze. He has really made me feel welcome in the GeoTools world. (I hope to assist Jody this summer with a 2007 Google Summer of Code student working on a GeoTools project.) Jody and I also talk “off the mailing lists” about ways we can work together. I can’t tell you how valuable it is in terms of collaboration to have someone at GeoTools that I can just bounce ideas off of. (Jody recently discussed cooperation between members of the Java GIS community on his blog. Click here to see the post.)

Still, we have some major challenges when it comes to collaboration in the Java GIS community, and the light at the end of the tunnel might be train locomotive.

What are some of these challenges?

I had originally planned on slowly integrating the GeoTools feature model into OpenJUMP, but I almost had a riot on the JPP Developer mailing list when I suggested doing so. It seems the GeoTools project has reputation for instability, especially within the feature model, and this has to be dealt with regardless of whether or not the reputation has been earned.

I still think there is a lot of disagreement about what a feature model should look like, and where the balance between simplicity and functionality should be found. This disagreement takes place even within projects, much less between projects. No real chance for collaboration on feature models until some of this debate is settled.

If we are to make collaboration between members of the Java GIS community a success we need to be realistic. Am I going to be ripping the feature model out of JUMP so that I can throw in the one for GeoTools or Deegree? I realize now that this is not possible from a practical point of view.

What can we accomplish?

There are at least two things I think we can do to "enable" greater collaboration.

The first thing we can do is to develop a library of code that provides conversion between elements of the projects different feature models. This is something I hope to work on for OpenJUMP’s FeatureCache. These converters will allow programs like OpenJUMP to consume feature models from other projects with a layer of insulation against the change in those projects.

The second thing we can do is identify areas where a big hunk of legacy code isn’t standing in the way of collaboration. This could be in the one or two areas where the projects already share code.

The JTS geometry library is the best example I can think of. Just about anything we build directly on top of JTS can be used by many of the Java GIS projects. (Support for spatial relationships and topology are one example.) Whatever we do we can’t let a project that is already using JTS replace it with some custom, home grown geometry library! Not even in the name of OGC compliance! Instead we should get involved with the development of JTS so it can grow and mature. We should ask ourselves how you can accomplish our programming tasks using JTS, instead of writing our own geometry library.

We also have the opportunity to collaborate in areas of new functionality. How many of the Java GIS projects have a set of standard libraries to handle map labels, annotations, and dimensions? I know OpenJUMP doesn’t. That is a functionality that isn't tightly coupled to a feature model that could be integrated into OpenJUMP via a standard set of interfaces and implementations. There is certainly other functionality that fits into this category.

How do we start?

In his blog post Jody Garnett suggested we work on a set of Java interfaces that answer the question: “What is map?”

I think this might be tough place to start, because a map is a rather complex object. (Should we start with a simpler concept?) Still, I’m willing to give this a try.

So what type of map are we talking about Jody? Are we talking about a paper map or a digital map? There are some big differences in how I would represent the two types of maps with Java interfaces.

The Sunburned Surveyor


Jody said...

You are correct that "data formats" is the killer app for collaboration - and also that GeoTools has a bad reputation for keeping API stable (this is by intentions when in RnD mode - when we have something figured out we punt it over to GeoAPI).

But you give me an idea. The fact that the feature model is upsetting our calm is indicitive of us solving the wrong problem - the real problem is: a) spatial and b) data access

Feature model is about data representation (and that has nothing to do with the above). Woot.

If you consider a layered OR mapper system you get a couple of neat consequences (think hibernate). One level makes "queries" and grabs the data out of the "formats" and into a staging area. The second level takes information from this staging area and constructs objects.

Lets treat the problem of spatial data formats in similar fashion. We can agree on a common API for data access - the result giving us a staging area (at the very least arrays of Mixed Java objects for the attributes, and JTS Objects for the spatial).

It can be up to the next layer (possible project or application specific) to map from this to their Feature model.

There are of course some performance wrinkles - often you want to decimate your spatial content and stuff it into Java 2d Shapes rather than JTS for performance/rendering. But optimization comes later.

Historical note: You will find in the Jump projects an early fork of the GeoTools shapefile reading code, in a similar manner JTS now includes a fork of some oracle spatial reading code. There really are not that many solutions around - we just have a bad case of cut and paste and hack.