Thursday, June 07, 2007

Alternative To A Complex Feature Model

I was thinking about the discussion on more complex feature models that have taken place recently. I was wondering if it was really necessary to modify JUMP's feature model to obtain some of the services that a more complex feature model would allow.

I think that you could use "relationships" to model complex features instead of embedding one feature as the attribute of another feature, as you would in a more complex feature model. (When I say relationship, I mean something similar to the relationships that exist between table records in an RDBMS.)

For example, let’s say that I want to model a municipal road system as a complex feature. This road system contains a number of road segments. You could approach this in two or three different ways:

[1] You could represent the municipal road system as a FeatureCollection. This FeatureCollection would also be a Feature with its own unique attributes, like "NumberOfRoadSegments". It would contain RoadSegment features as members of its collection.

[2] You could represent the municipal road system as a more complex Feature object. One of the attributes of this feature could be a Java Collection or some similar container that held references to the RoadSegment features.

I think that both of these options are somewhat cumbersome, and would require some tweaking of OpenJUMP's innards. (Although the first option is probably less cumbersome than the second.) What if we that like a relational database designer for a minute and "normalized" our data? If I was designing a traditional RDBMS to model a municipal road System I would use two tables. The MunicipalRoadSystems table would store its attributes in the table columns, and one of these attributes would be a unique identifier. This would likely be the primary key for the table. The RoadSegments table would store the attributes of individual road segments in its table columns. Each record in the RoadSegment table would also include the primary key or unique identifier for a road system. This would establish a relationship between the road segment and the road system. (Or I might have a third table that stored both the road segment unique identifier and the road system unique identifier. This would allow a road segment to be part of more than one road system. This would be a relationship table used to model a many-to-many relationship.)

Now, consider a third option for representing a municipal road system based on the "normalized" RDBMS approach.

[3] You create a non-spatial FeatureCollection to represent the MunicipalRoadSystem. (This is similar to the MunicipalRoadSystems table in the RDBMS.) You then create a spatial FeatureCollection that stores a Feature for each road segment. You then design a plug-in that manages relationships between features and presents an API for these purposes to the developer.

I can’t think of anything you could do in the first two approaches that couldn’t be done in the third approach. The third approach also has two distinct benefits:

[1] It doesn’t require modification of OpenJUMP’s current Feature Model, which is both simple and elegant.
[2] It involves the creation of a system for managing relationships between features. Although this can be used to model complex features as I have described, it can also be used to model relationships between simple features. This is a powerful bonus.

When I have to explain what a GIS is to someone unfamiliar with the topic I usually say something like “its an intelligent computer-based map”. If the person has some technical skill or is familiar with computers I’ll say something like “it’s a database that can handle location information” instead.

I find that I can often devise solutions to GIS software design problems if I try to remember that a GIS is a “database that can handle location information”. Although this is an oversimplification, I find it allows me to take advantage of relational database design principles.

This alternative to representing complex features is an example of this principle in action.

The Sunburned Surveyor


Dr JTS said...

Good blog post.

It seems to me that you are recapitulating the object-vs-relational debate that was raging strongly in the DB world in the late 90's. It seems like the RDB's have won that round (at least for the time being) on the server side, but the object world is obviously in the ascendant on the client side.

I also think that this debate is mostly about implementation, but doesn't really address the more fundamental issue of what the actual semantic model is. And that's the important question, because that is what is needed to motivate the design of the user interface and tools to present it to the user. To put this more concretely, it would certainly be possible to represent complex features as sets of related tables. But what would the user interface look like on top of this? How would Paul get his view of complex features? What would it mean in terms of UI to have a complex feature with either two geometries at one level, or an attribute which is itself a collection of spatial features? You would still need a way for the user to indicate how he wanted these to be rendered, how sub-features were created and destroyed, etc.

Moreover, a table based model in not in fact simpler, it's more complex! This is because it is more general, since relationships between tables can model both association and aggregation. In contrast, a complex hierarchical Feature is generally considered to model an aggregation. A concrete implication of this difference is that in a table-based model, you either have to explicitly define or explicitly manage the life-cycle of related sub-features. In an object-based model, this is controlled by the top-level feature, which is a simple model for the user to understand. (RDBs usually handle this situation by implementing "cascading delete on foreign keys" - but you don't get this for free, you have to have a language to define this and the user has to understand how to model this).

To use another metaphor, I sometimes think of tables as the assembly-language of object models. You can represent pretty much any model you want using tables, but *you* have to to all the work of defining and maintaining the model. An object model is presented at a slightly higher level of abstraction, with some reasonable defaults which can make it easier for a user to work with.

My main point here is that I think the choice of object-vs-table is orthogonal to the issues of defining the "mental model" that JUMP presents to users via it's UI. *That* is the key thing to design - the rest is just a question of implementation.