Explain GeoPlan

If you’ve ever written code to intersect two featureclasses, you probably realize that there are different approaches to take. The optimal approach really depends on the featureclasses being intersected. If one of the featureclasses is small, it’s probably best to just cache it in memory as something akin to a spatial hashtable. If it is larger, maybe use IFeatureIndex or ISpatialCacheManager. Also the max number of features intersected in featureclassA by a feature in featureclassB could be taken into account, which ArcSDE should be able to answer fairly quickly.

Oracle provides a nice capability called “explain plan”. It would be nice if ArcGIS provided something similar, perhaps “explain geoplan”, that could be used to optimize the various geoprocessing overlay functions by looking at the size of the featureclasses along with the morphology (geomorphology?) of the envelopes of the features. Unlike Oracle though (as far as I can tell) I’d like an option to use a previously defined execution plan when I run an overlay.

Distributed Geoprocessing

Explain Geoplan could also be beefed up to recognize when distributed processing is practical. Suppose we have two feature classes such that the envelope of each feature only intersects a small number of other envelopes, e.g. census tracts. In this case it seems that the processing job could be distributed across cpus in a server cluster.

Imagine publishing a geoprocessing model onto a cluster server that would be able to quickly run the model by spreading the load across multiple cpus.
And maybe not just overlays, how about topology validation too? I recall Clint Brown mentioning how many hours (~50?) it took to validate a geodatabase topology composed of census geography for the entire US. Seems like this could be distributed too.
When ESRI points out that Google Earth can’t run geoprocessing tasks, will Google respond?
While there has been lots of talk about how Google Earth is shaking up the GIS arena, geographers seem to be forgetting Google’s core strength – distributed computing. If Google responds to ArcGIS 9.2 it seems likely they will leverage this strength.


2 comments so far

  1. Steven Citron-Pousty on

    Unless you bought ArcGIS Server 9.2, you can not use a server cluster to increase the efficiency of GP tasks. GP is single threaded and so there ends the discussion. I am not sure how they worked out multithreading with server but that should off a ray of hope on that front.

    I 150% agree that GP needs an explain option but I am not holding my breath. It would certainly make sense but I think ESRI sees GP as black box – They knew the best way to make the tool so there is no need for you to be concerned about what is going on under the hood.

    BTW, postgresql and mysql also have explain options since using it helps quite a bit when deciding whether you need to index columns or “denormalize” data.

  2. Administrator on

    When I spoke with a very senior ESRI developer at the UC, he mentioned he is looking into exposing “threads” in the ArcObjects API. Perhaps this exposure would allow us to spread gp loads across servers. Of course it will be hard for ESRI to come up with a licensing scheme for this, but maybe us bloggers can use FUD as motivation …. altogether now: “if ESRI doesn’t figure this out, then google will”.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: