Archive for the 'ArcObjects' Category

Faster Calculations in Arcmap

ricky ricardo
Somebody has some splaining to do.

I like to think of myself as a patient person, not in a hurry. However sometimes when I use the Field Calculator in Arcmap my patience is challenged.

I have a featureclass of sidewalks related to a street centerline network. 284000 sidewalks and 54000 street centerlines. When I calculate a field on the sidewalks, setting it equal to a field from the street network, it takes several hours. That’s in a file gdb with the key field indexed. Somebody has some explaining to do.

Last time I read ESRI’s EULA, I recall it prohibiting publication of benchmark statistics, so I have not included precise numbers. Looks like they are following Oracle’s policy.

Unlike Oracle, ArcGIS does not allow me to determine what execution plan it is using (see EXPLAIN PLAN). I’m quite certain it always uses the same plan, which in this case is a very poor one.

To make it faster, I cache the key/value pairs from the street attribute table into memory using a System.Collections.Hashtable. This takes about 1.4 seconds. I then open a cursor on the sidewalks table and loop through it, looking up the value from the hashtable using the key field value from the sidewalk featureclass. This takes less than a minute.

Interestingly an update cursor is slower than using a cursor created via IFeatureclass.Search. I think this is just with a file geodatabase though, on ArcSDE I believe an update cursor is generally faster, assuming proper rollback segment sizes are configured.

Maybe what ESRI should do is to beef up the MemoryRelationShipClass so that it allows the user to examine and/or specify an execution plan. That way I could tell it to use a hashtable when it does the join, alleviating me from having to roll my own.

For a good comparison of hashtables, sortedlists and dictionaries, see this post.

The shortcoming of this approach is that I can’t do field calculations, just simple assignment. Some day I’d like to try using CodeDOM to generate code from an expression the user has provided. It would need to substitute in the field values for field names. Since square brackets are used in C# I’d need different field name delimiters.

Sourcecode in Geodatabase Prototype

I’ve written a proof of concept editor extension based on the ideas outlined in my previous post, plus some helpful feedback from Brian Flood (thanks!).

The solution, which includes an installer and test file gdb, has been uploaded to arcscripts, right here.

The editor extension maintains a generic List of IExtensions. The extensions in this list are instantiated at OnStartEditing from source code in a table called SourceCode in the edit workspace.

There is also a command provided on a commandbar that allows you to browse and load a source .cs file into the SourceCode table, after verifying that it compiles without errors.

Note that the “using” statements need to include a full path name to the assembly files referenced by the source code.

I’m wondering if this approach might be easier than class extensions as well as easier to maintain. I haven’t tried it, but I suppose it would be possible to have it work with shapefiles as well.

The potential uses of dynamic compilation are intriguing. I’d really like to try this in a IServerObjectExtension. More on that later.

Storing Code in a Geodatabase

It appears that the .NET 2.0 framework will be part of the standard install now for ArcGIS. This means it should be possible to dynamically compile code at run time using CodeDOM.Compiler.

What if we could store code in the geodatabase that would be dynamically compiled at run time?

A major pain for working with ClassExtensions is that people who do not have the DLL installed on their machine are not able to even open featureclasses that have extensions. Perhaps it would be possible to store code in the database (optionally obfuscated) for a classextension.

Perhaps this approach could also be applied to support triggers. In arccatalog, I’d like to be able to right click on a field in a geodatabase table (or featureclass) and provide code that would be called for OnCreateFeature, OnDelete etc. Behind the scenes the geodatabase would store this in a GDB_ table, and when a feature is added to a featureclass (or table) it would JIT compile and call the code. The GDB_ table could also provide a read/write property field (or fields) that would allow me to implement sequences.

I suppose it should be possible to write Visual Studio Addins to support editing code stored in a geodatabase.

I guess there’s really no reason the code would need to be C#, VB.NET or whatever. Maybe a simplified (domain specific) language could be provided that even DBA’s could understand. For example, say I want to assign a county ID to a point whenever a point is created or updated, based on the county the point falls within. The C# code to accomplish this might be a bit overwhelming, so maybe a simpler language could be provided that expresses this. Of course ESRI would need to provide a compiler to create the CIL from the simplified language.

Maybe this approach would also allow custom features. Custom features were promoted at 8.0, but they never really worked as intended. AFAIK ArcFM is the only custom feature based solution in widespread use. What if the CLSID stored in the GDB_Objectclasses table were a key to another table that stored code. Instead of instantiating a COM class when the objectclass is opened, ArcGIS could JIT compile the code stored in the geodatabase. If this is possible perhaps ESRI could provide us with a base Feature class that we could extend and whose methods we could override.

Update: I’ve written a prototype and uploaded here.

Flash Disk i/o Performance & Rebasing DLLs

SanDisk has a new 32 GB flash disk coming out that supposedly has 100x faster i/o than magnetic disks. For disk i/o constrained geoprocessing, sure seems like this could improve things.

Still, I wonder how much quicker Arcmap would load. I suspect a lot of the load time is not from disk i/o, but from “rebasing” dlls. I’ve noticed Arcmap startup time seems to increase more than linearly with the number of extension dlls being loaded.

Think about it: The more money you spend on Arcmap extensions, the slower it will load.

This MSDN article describes the costs of rebasing.

Arcmap loads lots of dlls, written independently by different developers. It’s just not practical to collaborate on base addresses. The JIT Extension category improves things, but I still notice slower startup time when more extensions are installed - even if they are in the JIT category. If a toolbar is turned on that references an extension, then that extension will load at startup.

Have you ever noticed how Visual Studio loads slow the first time, but much faster for subsequent loads? This article sheds some light, quote:

“By the way, on the “faster load time” issue, I should mention that when an executable module is unloaded, Windows puts its pages on a “standby” list, a kind of cache from where the module’s pages can be retrieved very efficiently when it is loaded again. So if you load a DLL for a second time, and its pages are still in the standby list, it will load a lot quicker than the first time.”

Seems like ESRI Desktop apps could benefit from a standby list too.

But instead of assigning base addresses by hashing the dll name, it seems like we could rebase them using where they actually end up being loaded in memory, essentially persisting the standby list on disk.

And what is Desktop, Chopped Liver ?

James has pointed out how ESRI is focusing more on Server than Desktop at the upcoming Dev Summit. This is not just in terms of Conference resources. There are cool & useful things in ArcGIS server that have no equivalent in Desktop.

For example, take a look a the FeatureGraphicsLayer, available in AGS. This handy class inherits from DataTable.

So what about desktop? I can create an in-memory featureclass via an InMemoryWorkspaceFactory (by trial and error, there’s not much documentation). However, it’s still an IFeatureClass … it would really be better if I had a something like a layer that inherits from ADO DataTable that could be shown with a TableView in ArcMap.

Domain Specific Languages for GIS?

When I first heard about domain specific languages (DSLs) a couple of years ago, I figured there would soon be folks really digging into this for GIS.  The discussion here about GIS scripting is interesting, but would be more helpful to have a DSL discussion.  I guess ESRI’s modelbuilder is a DSL IDE, looks similar to the one discussed here.  I always liked the way unix allows you to chain commands together into one long single command, piping the output of one command as the input of another.  No IDE needed - its all command line.  Seems like unix-friendly DSLs would be a natural step for the opensource folks. OGC mentions DSLs here as “GML Application Languages”, but I can’t really follow this to anything concrete.

.

Flu Geography

I’ve seen a lot maps showing the spread of avian flu. I have not heard much discussion about how the flu might alter our spatial behavior or impact internet usage.  Likely the first thing will be to keep the kids home from school.  Many more employers will allow remote workers.  Instead of going to the stores, more people will opt for delivery.

All these changes will increase our reliance on the internet.  Internet facilitated home-schooling might reach a critical mass so that after the flu subsides many children may not re-enroll in public schools.  More offices may realize enabling remote workers is cost effective, not just during a pandemic.  Internet based grocery shopping will finally become profitable - viral marketing in the most literal sense.

Modeling Infection

Since this blog is about GIS programming,  let’s consider Conway’s Game of Life as a type of GIS modeling.  It would be interesting to see a tool that would allow you to load up some real-world GIS data into a dynamic layer, push a button, and say “there goes the neighborhood!”.  Property values seem to change in an infectious way.

New urbanism claims a 5 minute walk to get an ice cream cone is the litmus test of neighborhood health.  I think this could be modeled.  I wonder how the density promoted by New Urbanism will fare after a flu scare.  Suburban sterility may become more appealing.  Anyway, seems like a Game of Life approach could be used to model some New Urbanism.  Use parcels for cells and distance instead of adjacency.

Infection of Modelers

These are just ideas, but they may also be thought of as memes.  While I really like using ESRI software, sometimes I fear that its proliferation has effectively innoculated the geospatial community so that ideas that cannot be implemented with ArcGIS are simply ignored.  Think of it as herd immunity.

Just as gene therapy has been identified as having potential benefits, maybe the geospatial community should consciously enroll into meme therapy.  I think James Fee’s exploration of Manifold serves as a good example of this.

The Long Tail of GIS Consulting

The Long Tail has been used to describe how small niche writers can become successful through the internet. Amazon is often mentioned, but I think eBay is an even better example. Joe Francica has discussed its relevance for geospatial data. I find it relevant to GIS Consulting.

Let’s say ESRI comes out with a new ArcObjects release. According to Kirk’s Law: Writing code is always more fun than documenting it, so documentation will always lag. This lag presents a business opportunity. Let’s say there’s some cool new functionality offered by ISomeObscureInterface. If you’re like me you search EDN, then Google. If you’re like some people, if you don’t find anything you whine about how evil ESRI seeks to impurify the precious bodily fluids of the geospatial community by releasing poorly documented ArcObjects libraries.

Or, instead, you could choose to exploit this opportunity. Dig in, figure out the interface and post code showing how to use it. There’s a good chance a prospective customer will find you. I once worked for a company that spent more money advertising than developing the skill-sets needed to deliver what the ads promote. It brings to mind the movie, “How to get ahead at advertising”.

For a business plan, instead of spending money on silly advertisements, it makes more sense to write sample code for poorly documented aspects of ArcObjects.

It is tempting to keep this strategy secret for competitive advantage. However, I think the more consultants that start doing this, the more successful the strategy. So, I encourage others to do as I have done. If you prefer a less mercenary view, think of this as nurturing an ecosystem where distributed cognition can thrive, maybe even resulting in hippy poetry.

Breadth vs Depth

It is just not possible to master all libraries of ArcObjects. The total area of understanding (breadth x depth) is limited by number of hours in a day. So the issue is how much time to spend mastering ISomeObscureInterface in depth instead of broadening into ISomeOtherObscureInterface. More specifically, is the long tail just for specialists? Solutions to complex geography problems require skilled generalists. I am hopeful that blogs will provide a marketplace where generalists may thrive as well. So maybe we will see an evolution of synoptic GIS planning discussions.

It all boils down to disintermediation: skilled consultants finding work directly for clients without funding an army of suits.

ArcObjects 9.2 online documentation

Just on a whim, I typed in this link and found the documentation.

Maybe there’s something linking to it, from an official page and I just don’t see it.

ArcGIS Server in an Elastic Compute Cloud

I’m just trying to think through what it would take to deploy ArcGIS Server to Amazon’s Elastic Compute Cloud.

The hard part appears to be building the AMI. As far as I can tell the AMI is similar to a VMWare virtual machine.

In addition to providing a sensible licensing model for this, it would be nice if ESRI would provide a “starter” AMI with all the necessary software loaded, then I could just load my customizations and data on top of that, and push it into the cloud.

The beauty of the cloud concept is its scalability. I wonder how tricky it would be to add new SOC machines to the cloud. For something like an emergency management system scalability is hard. Typically load would be very low, until something like a hurricane comes along.

Maybe I’m thinking wrong on an emergency mgmt system … maybe what is needed is a lot of small focused apps that mash together feeds following the Common Alerting Protocol.

Next Page »