Wednesday, April 18, 2007

The GeoTools Dilemma

My Interset In GeoTools

My interest in GeoTools has increased in the last couple of months. This has been primarily for two reasons. The first is my need for a ESRI Shapefile parser, which will be a key part of my Feature Cache. (OpenJUMP's current ESRI Shapefile reader places all features contained in the Shapefile into memory. This can cause an exception when reading Shapefiles with a large number of features.) The second reason I'm interested in GeoTools is I believe collaboration among open source projects is important, and GeoTools is an excellent candidate for this collaboration because it is written in the Java programming language. I recently had a little time to look at the API documentation for some of the GeoTools code that works with Shapefiles. During this process I realized we will face a major challenge in getting OpenJUMP and GeoTools to play nicely together.

The Challenge Working Closely With GeoTools – The Feature Model

If you use the “high-level” GeoTools code to work with Shapefiles you can easily obtain Feature objects. However, it is important to note that these Feature objects are not the Feature objects used in JUMP or OpenJUMP, but a GeoTools version of a Feature. When you look at the API docs for both Features you can see there aren't a great deal of differences. Many of the methods are similar. I would even admit that the GeoTools Feature interface contains some improvements, like a design that considers the fact that Feature's can contain multiple geometries.

However, I think the slight differences in the Feature interfaces in OpenJUMP and GeoTools create a very big problem. (Its not just the differences in this interface, but also the related interfaces used to represent things like Feature attributes and Feature schemas.) The differences in the feature model used in the two code bases means that many of the “high-level” tools can't be used interchangeably without some tweaking or hacking.

As an example, having classes that allow me to work with Shapefiles does me little good if they provide implementation of the GeoTools Feature interface, because I need to parse or read the Shapefile and obtain implementations of the JUMP Feature interface. I only see a couple of ways to address this major challenge to code sharing between the two projects:

[1] Modify OpenJUMP to adopt the GeoTools Feature interface and Feature model.

[2] Design and use a “converter” that can change GeoTools Feature objects into JUMP Feature objects and back again.

[3] Develop and maintain the JUMP Feature interface and feature model independently of GeoTools, and loose many of the benefits of collaboration.

The first option may not be practical, because the Feature interface and feature model are such a key part of OpenJUMP. Refactoring the program to use the GeoTools Feature model would be a major undertaking, even for someone like Vivid Solutions. There are also some risky “unknowns” to this approach, including the way the GeoTools projects decides what changes are made to this critical interface, and what voice, if any, OpenJUMP developers could have in these changes if we adopted this interface. (In all reasonableness, we don't contribute to GeoTools at this point, and couldn't expect a say even if they would give it to us.)

The third option is the most practical and the easiest, at least in the short term. However, it eliminates what I think could be many future opportunities for the two communities to work together on common code. GeoTools won't need OpenJUMP's participation to survive, but OpenJUMP would definitely benefit from the work being done at GeoTools. We have a very small development team, and I don't think anyone would argue that the GeoTools staff is larger and produces more code.

The second option likely represents the middle ground, but it comes with its own disadvantages. If we use a converter we have to, at a minimum, require at least some of the OpenJUMP developers to be familiar with both the GeoTools and JUMP Feature interfaces and feature models. The GeoTools code base is by no means simple, and like OpenJUMP's, not well documented. So this is a somewhat difficult thing to ask developers to do. Then there is always the risk that as the GeoTools feature model evolves the ability to easily convert to and from the JUMP feature model will become more difficult.

I don't know what solution is the best for OpenJUMP and the open source Java GIS community as a whole. I'll be asking members of both the GeoTools and OpenJUMP communities for their input and suggestions. Perhaps there is a better solution that I have not thought of.

This may seem trivial at this point in time. After all, we're only talking about my FeatureCache implementation at this point, and that isn't a real big problem on anyone's radar. However, I think it highlights a problem that will continue to crop up during any effort at increased collaboration between GeoTools and OpenJUMP.

Opportunities For Collaboration Remain

In the worst case scenario some limited opportunities for collaboration between OpenJUMP and GeoTools will remain. Both code bases make use of a common geometry library, the Java Topology Suite. (I believe that GeoTools is currently working on, or has completed, an layer of abstraction on top of JTS. This will allow them to work with other Geometry libraries as well.) As long as the common geometry library remains in place we can share code that manipulates these geometries and that works with aspects of GIS that are closely tied to geometry manipulation and analysis. (Topology would be one example.)

There is also the opportunity to use “low-level” code for data access. For example, even if I can't use the “high-level” code to extract Feature objects from Shapefiles, I can use the “low-level” code to get to the DBF and SHP records. As another example, if I ever get around to writing a good DXF parser I could expose access to the elements of the DXF file in a way that would allow use by the GeoTools projects. This would require some forethought on the part of the developers of such data access code. Typically this “low-level” code is hidden in the “high-level” code. Classes would have to be designed in a way that exposed the “low-level” code through a public API, preferably in a way that avoids the baggage of the “high-level” funcitonality.

Other opportunities for collaboration between GeoTools and OpenJUMP may exist, but I'm not yet sure what they would be. Our ability to share GUI code is hampered by the fact that UDig is built on SWT and Eclipse RCP while OpenJUMP is a Swing application.

The Long Term Implications

Without some serious effort and some innovative thinking I'm afraid GeoTools and OpenJUMP will drift farther apart. Realistically, development on GeoTools will progress at a faster pace. I'm not sure what that will mean for OpenJUMP in the long run. GeoTools has the advantace of being designed as a library, which means it can be adopted by other programs. The more programs that use the library, the better its chances of success and long term survival. (This also adds a level of complexity. You need to consider the needs of multiple programs when designing the library. OpenJUMP doesn't currently have this problem.) If we decide collaboration with GeoTools isn't worth the cost I think we need to seriously consider extracting portions of OpenJUMP's code base to an application library. I don't know how we'd find time for this, but I think it would be an important step to remain a viable alternative to GeoTools.

I hope this won't be necessary. There aren't many other players in the open source Java GIS arena, and we'd accomplish a lot more together.

The Sunburned Surveyor