Archive for June, 2008|Monthly archive page

The Database as an Alternative to the GAC

Based on initial investigations of storing source code in the database, I’m now focusing on storing assemblies in the database, serialized as blobs. Doesn’t make sense to be recompiling all the time. Instead, I just serialize the dll file as a byte array, then store that via memoryblobstream to a blob field. Deserialize using System.Reflection.Assembly.Load(Byte[]). In effect, this allows using a DBMS in lieu of the GAC (Global Assembly Cache).

The deployment issues this addresses are not peculiar to GIS, yet I can’t find much discussion of this. I’d be interested in hearing from anyone who has tried this. Treating code as data is receiving attention these days, so why not store assemblies in the database?

ESRI’s discussion of Enterprise DBMS seems to avoid discussing the inability to centrally manage business rules using triggers in a versioned geodatabase. I’m working on an Editor Extension that manages triggers instantiated from a database assembly cache to compensate for this.

Another common use of editor extensions is to trigger some code in response to editor events. (from “About Editor Extensions” in EDN)

It seems like a domain specific language could be derived that focuses specifically on managing triggers, similar to the approach described here by the Mechanical Bride, except the DSL code would be compiled into an assembly that would then be stored in the database.


Spatial Correlation: Crime and Section8 Vouchers

The Atlantic Monthly has an interesting article suggesting that Section8 vouchers are correlated with crime. I find it odd that no maps are provided (haven’t seen the print edition yet).

On the merged map, dense violent-crime areas are shaded dark blue, and Section8 addresses are represented by little red dots. All of the dark-blue areas are covered in little red dots, like bursts of gunfire. The rest of the city has almost no dots.

A description of Memphis from American Murder Mystery

Faster Calculations in Arcmap

ricky ricardo
Somebody has some splaining to do.

I like to think of myself as a patient person, not in a hurry. However sometimes when I use the Field Calculator in Arcmap my patience is challenged.

I have a featureclass of sidewalks related to a street centerline network. 284000 sidewalks and 54000 street centerlines. When I calculate a field on the sidewalks, setting it equal to a field from the street network, it takes several hours. That’s in a file gdb with the key field indexed. Somebody has some explaining to do.

Last time I read ESRI’s EULA, I recall it prohibiting publication of benchmark statistics, so I have not included precise numbers. Looks like they are following Oracle’s policy.

Unlike Oracle, ArcGIS does not allow me to determine what execution plan it is using (see EXPLAIN PLAN). I’m quite certain it always uses the same plan, which in this case is a very poor one.

To make it faster, I cache the key/value pairs from the street attribute table into memory using a System.Collections.Hashtable. This takes about 1.4 seconds. I then open a cursor on the sidewalks table and loop through it, looking up the value from the hashtable using the key field value from the sidewalk featureclass. This takes less than a minute.

Interestingly an update cursor is slower than using a cursor created via IFeatureclass.Search. I think this is just with a file geodatabase though, on ArcSDE I believe an update cursor is generally faster, assuming proper rollback segment sizes are configured.

Maybe what ESRI should do is to beef up the MemoryRelationShipClass so that it allows the user to examine and/or specify an execution plan. That way I could tell it to use a hashtable when it does the join, alleviating me from having to roll my own.

For a good comparison of hashtables, sortedlists and dictionaries, see this post.

The shortcoming of this approach is that I can’t do field calculations, just simple assignment. Some day I’d like to try using CodeDOM to generate code from an expression the user has provided. It would need to substitute in the field values for field names. Since square brackets are used in C# I’d need different field name delimiters.