Friday, June 22, 2007

Standards in Software – The Ugly Truth

I want to take a few moments to deal with the ugly truth about software standards. “Standards” have become the hip thing in the GIS world, especially among the open source Java GIS commuity, since the OGC gained momentum. Here is an example of what I am talking about:

Deegree Project: “deegree is a Java Framework offering the main building blocks for Spatial Data Infrastructures. Its entire architecture is developed using standards of the
Open Geospatial Consortium (OGC) and ISO/TC 211 (ISO Technical Committee 211 -- Geographic Information/Geomatics).”

GeoTools Project: “standards compliant methods for the manipulation of geospatial data, for example to implement
Geographic Information Systems (GIS). The GeoTools library implements Open Geospatial Consortium (OGC) specifications as they are developed…”

GeoAPI: “The GeoAPI project aims to reduce duplication and increase interoperability by providing neutral, interface-only APIs derived from OGC/ISO Standards.”

What do all of the projects have in common? They are all written in Java, and they all implement one or more OGC standards.

I don’t want this to be a rant about the OGC and my problems with how they operate. (I’ll save that for another post.) I want to try to obtain some level of objectivity, so lets just examine software standards on their face irregardless of whether they are produced by the OGC or not.

Here are a couple of problems caused by software standards that are not properly designed and executed:

[1] They create obstacles to collaboration.
[2] They can stifle innovation.

I know those sound like some pretty controversial statements. “What is this guy talking about?” you might ask. “I thought software standards were supposed to foster collaboration and innovation.” Software standards are meant to do that. Boats are meant to float. A lot of them end up at the bottom of the ocean too. (Remember the Titanic?) Why does this happen? Because these boats weren’t designed correctly or built properly. That doesn’t mean you can’t achieve your goal of floating with a boat, it just means that you have to go about it in the right manner.

Let’s examine the two problems I mentioned with software standards one at a time.

Poorly Designed and Executed Software Standards Create Obstacles To Collaboration

Poorly designed standards fall victim to the trap of “standard euphoria”. People become so excited about standards, they become so focused on implementing these standards, that they forget to question the technical merit and practicality of the standard they are trying to use and implement. That can lead to a situation where a lot of different software projects are implementing the same poorly designed software standard. Or you get a situation where programmers are throwing a lot of time at implementing a confusing and complex software standard because “everyone else is doing it”. They would be better of spending that time on other functionality for their software or on improving software quality.

Most people would assert that this is an easy problem to overcome. “Just design better standards.” That is often easier said than done. For one thing, with the design of software standards often involves meeting the needs of a number of very different interested parties. That is an admirable goal, but is no easy task.

What if the process of designing the standard is an exclusive one, given to only a few privileged members of a software development community? That can lead to poor design as well. You could be excluding the one programmer that sees the problem in your standard and knows how to fix it. Or you could be ignoring the one programmer that will be deals with the very real world issues you are trying to address with your standard. In some sense software standards developed in this way go contrary to the very core concepts of open source software: Mutual benefit by shared knowledge.

Return to my example of boat design. Imagine that you are design a standard for the design of boats but you decide to only allow companies that design million dollar yaughts or container ships to participate. You exclude all the people that build and work on smaller vessels. How good will your standard be? What price will you pay for excluding that pool of valuable boat building and boating experience?

Why would anyone close of the process of designing a software standard like this? Control, control, control.

Make no mistake about that. (If you were a proprietary software company working to design a software standard that would dominate your particular industry, would you tweak it to your competitive advantage, or to someone else’s competitive advantage? Come on now, be honest with me…)

The execution and delivery of a standard can become an obstacle to collaboration as well. Even after a standard is complete the manner in which it is delivered to the community can be important. Will you require people to pay a fee to access your standard? Will you limit the way it is used or implemented? These aspects of standard execution and delivery can also stifle collaboration if not handled properly.

Poorly Designed and Executed Software Standards Create Obstacles To Innovation

In this situation we find that “standard euphoria” is the main culprit. People become so focused on implementing some software standard that they divert time and energy from other areas. That wouldn’t be a problem, unless the standard is poorly designed in the first place.

Another obstacle to innovation comes when members of the community are unwilling to work together on some area of software design because it isn’t going to become adopted as an “official” standard of a particular organization, or because it is still in the process of becoming a standard.

Let me give you one example of this. I have been working towards the goal of increasing collaboration between OpenJUMP and other members of the open source Java GIS community. As part of this goal I identified some functionality that I would like to add to OpenJUMP at some point in the future. I decided to approach a project whose stated goal is to encourage collaboration between Java programmers working on GIS. Although members of the project were very friendly, I was basically told that the project was only concerned with code that implemented OGC/ISO software standards. I didn’t let this slow me down, so I started looking around for an organization or group that would be interested in hosting code that explored some new aspect of GIS not covered by an OGC/ISO standard. I couldn’t find this organization, and I’m not sure that it exists. What does that mean? It means if you really want to work together on some particularly innovative area of GIS software development you’d better be involved in the OGC/ISO standards process. If you don’t want to do that, or if you are excluded from doing that, you had better be prepared to work on your own.

What type of software standards really work?

Does all of the above commentary mean that I think software standards are no good? On the contrary, I think they are one of the best things for software development, if they are properly designed and executed.

What does a software standard need in order to “float” or to meet its goals of encouraging collaboration and innovation?

[1] Software standards need to be designed in a process that is not exclusive or where membership is not based on financial contributions and/or fees. Membership in the process of designing a standard should be based on an individual’s or organization’s technical skill, capacity for innovation, ability to work with others, knowledge or expertise in the area addressed by the standard, and ability to think critically and objectively.

[2] Software standards need to be executed and delivered in a way that promotes use and adoption by all members of the software industry addressed by the standard, not by only a few members of the community. This means that people should not have to pay for a copy of the standard itself or supporting documentation. This should be made available in a free and open format. (For example, release this documentation under one of the Creative Commons licenses.) The authors of the standard or the organization maintaining the standard should avoid putting unnecessary restrictions on how the standard is used and distributed. Individuals and organizations should not have to pay a fee to verify compliance with a standard.

[3] Software standards should be accepted on the grounds of technical merit and the ability to be practically implemented and used, not because they are rubber-stamped by some multi-company conglomerate.

Of these three (3) elements, I believe that last two (2) are most important. (Note, however, that it is a lot easier to get #3 if you have #1.) Consider just three common examples of this:

What file format do you use if you want to exchange CAD data between different software programs? (Let’s pretend you don’t want to cough up a bunch of doe for Autodesk’s DWG libraries.)

You use DXF. Why? You use DXF because it works well and it has a freely available specification.

What file format do you use if you want to exchange digital documents and you have no guarantee of the operating system or software platform that they are using?

You use PDF. Why? You use PDF because it works well and it has a feely available specification.

What file format do you use if want to share simple vector data between GIS programs?
You use ESRI’s Shapefile Format. Why? You use Shapefiles because they work well and the Shapefile format is defined in a freely available specification.


The open source Java GIS community needs to overcome its “standard euphoria”. This is especially important if this euphoria is for standards that are poorly designed and executed, and that don’t meet the three (3) requirements above. Let’s evaluate each software standard for our area of expertise on its technical merit, and not race to implement it because it isn’t rubber-stamped by an organization that doesn’t follow a design process open to our input. Let’s not dismiss the possibility of working together on a certain aspect of GIS software because it might not be covered by a standard from the same organization.

The Sunburned Surveyor
Posted on 8:36 AM | Categories:

Thursday, June 21, 2007

Organizing Features in OpenJUMP (Simplicity or Flexibility?)

Features are the basic unit of “work” in OpenJUMP. They are the fundamental “thing” that users are interested in manipulating. There are several ways to organize or group features to make the tasks of the user easier to accomplish. OpenJUMP currently uses a single “container” to organize Features from the user’s perspective. This container is a Layer. Users interact with Layers mainly through the Layer List. (Programmers know this as the LayerManager object.)

Layers provide the user with some useful “restrictions” on Feature organization that simplify their work. For example, all of the Features in a Layer share a common set of attributes, or a FeatureSchema. They also follow a common set of styling rules that govern their graphical appearance. Aside from these two restrictions, Layers are pretty flexible. They can store Features with different geometry types. They can share a name with another Layer.

Internally Layers are really just a wrapper around a FeatureCollection. But again, for the sake of simplicity, OpenJUMP places some restrictions on this FeatureCollection. As we mentioned before, it can only contain Features that share a common set of attributes or a FeatureSchema. This FeatureCollection also is not treated as a Feature itself, as can be done, at least in theory, with something like GML.

As we can see, there is a balance that we try to maintain with the design of OpenJUMP’s Feature Model between flexibility and simplicity. In order to have simplicity you must have restrictions on flexibility.

I don’t think we’ve really examined where we would like to have this balance as OpenJUMP developers, but I think the time for this discussion is right now. I think there will be an inevitable push towards more flexibility, as we continually try to find ways to bolt more and more functionality on top of OpenJUMP. I think we need to be cautious about this. One of the greatest assets OpenJUMP has, both from a users perspective and from a developers perspective, is its simplicity.

I’d like to propose the establishment of some ground rules for our Feature Model. I think it is important to preserve our simplicity by putting into place some common sense restrictions. If a programmer would like to get around these restrictions, lets ask that he do it with a plug-in, and not by modifying OpenJUMP’s core.

Here are the restrictions that I would like to propose:

[1] Each Layer must be uniquely identified in some way from other Layers. This doesn’t mean that Layers can’t appear to have the same name to the user. We could use a numeric identifier or require that every Layer name in a Category be unique. What do we gain with this restriction? It allows us more opportunity to use Layers as the main containers for Features. For example, this will allow Layers to serve as what is often called a FeatureType or FeatureClass in other GIS programs. (The user would be able to place constraints or validations tests on a Layer.) We are really close to this type of system already, because we require all Features in a Layer to share a common FeatureSchema. It is not that big of a leap to require that Layers also be uniquely identified. This would remove the necessity to uniquely identify FeatureSchema, because each FeatureSchema would belong to a unique FeatureCollection wrapped in a unique Layer.

[2] We will never, ever, ever expose FeatureSchema to the user through the GUI. We don’t need to do this, because all Features in a Layer share the same FeatureSchema. If a user needs to work with some aspect of a FeatureSchema have them access that functionality through a Layer, don’t expose it directly. What do we gain from this restriction? Things become much simpler from a user’s point of view. They don’t have to worry about FeatureSchema, Feature Types or Feature Classes. They can just think in terms of Layers. Do we loose some flexibility with this restriction? Yes we do. (For example, we won’t be able to let the user apply constraints to multiple Layers that essentially share a common FeatureSchema. They would have to apply these constraints to each Layer individually.) But I think this price is worth preserving simplicity.

[3] We will not allow FeatureCollections that can also act as a Feature themselves. This adds some significant complexity to OpenJUMP's Feature Model. We’ve already had some discussion on the JPP Development list about this. I think an alternative way to facilitate this type of functionality is by allowing associations or relationships between Features. This could be done with a plug-in and would not require modification of the current OpenJUMP/JUMP Feature Model. (I am not ruling out exposure of loose groups of Features. These groups could contain multiple Features from different Layers, but would not be allowed to act as a Feature themselves. This could be accomplished with a plug-in and these groups could be created from selection sets. I think of their purpose as being similar to a block in AutoCAD.)

I’d like to see some more discussion about finding the balance between flexibility and simplicity in OpenJUMP’s Feature Model. I’d specifically like a decision on what primary Feature “container” we will expose to the user. Will the Layer object remain in this role, or will we expose something else like a Feature Type in its place. If we do so, what will the price to simplicity be? Will we modify the Feature Model to support “Complex Features”, or FeatureCollections that can act as Features themselves? Are we willing to pay the price of lost simplicity for this functionality, or will we ask that this be done through association via a plug-in.

The Sunburned Surveyor
Posted on 8:16 AM | Categories: