Archive for the 'Geodatabase' Category

Faster Calculations in Arcmap

ricky ricardo
Somebody has some splaining to do.

I like to think of myself as a patient person, not in a hurry. However sometimes when I use the Field Calculator in Arcmap my patience is challenged.

I have a featureclass of sidewalks related to a street centerline network. 284000 sidewalks and 54000 street centerlines. When I calculate a field on the sidewalks, setting it equal to a field from the street network, it takes several hours. That’s in a file gdb with the key field indexed. Somebody has some explaining to do.

Last time I read ESRI’s EULA, I recall it prohibiting publication of benchmark statistics, so I have not included precise numbers. Looks like they are following Oracle’s policy.

Unlike Oracle, ArcGIS does not allow me to determine what execution plan it is using (see EXPLAIN PLAN). I’m quite certain it always uses the same plan, which in this case is a very poor one.

To make it faster, I cache the key/value pairs from the street attribute table into memory using a System.Collections.Hashtable. This takes about 1.4 seconds. I then open a cursor on the sidewalks table and loop through it, looking up the value from the hashtable using the key field value from the sidewalk featureclass. This takes less than a minute.

Interestingly an update cursor is slower than using a cursor created via IFeatureclass.Search. I think this is just with a file geodatabase though, on ArcSDE I believe an update cursor is generally faster, assuming proper rollback segment sizes are configured.

Maybe what ESRI should do is to beef up the MemoryRelationShipClass so that it allows the user to examine and/or specify an execution plan. That way I could tell it to use a hashtable when it does the join, alleviating me from having to roll my own.

For a good comparison of hashtables, sortedlists and dictionaries, see this post.

The shortcoming of this approach is that I can’t do field calculations, just simple assignment. Some day I’d like to try using CodeDOM to generate code from an expression the user has provided. It would need to substitute in the field values for field names. Since square brackets are used in C# I’d need different field name delimiters.

Amazon Dynamo

dynamo
This graph is from a paper on Amazon’s Dynamo. I suspect other months (not during shopping season) would look quite different.

With a name like “Elastic Compute Cloud”, I would expect the price for EC2 to reflect the supply of available compute capacity - isn’t this what elasticity of supply is all about? Currently pricing does not reflect time of use. I wonder how much performance degrades for EC2 users during Christmas shopping season.

Amazon relaxes the Consistency part of the DBMS ACID requirement in order to achieve availability. Maybe another rule could be relaxed, the one saying keys should not have any meaning beyond their use as an ID. If we did this, maybe Peano keys could be used, providing a spatially enabled Dynamo-like system.

Linq to Geodatabase Provider

As suggested by Dave Bouwman and Ron Bruder, I’ve played around a bit with ArcGIS diagrammer developed by Richie Carmichael. I like it. It would be great to see something like this offered as a fully supported product.

I’ve also watched some of the Linq to SQL videos at MSDN. Keep in mind that Linq is extensible, so while currently there are only a few providers, expect soon to see other Linq providers appearing, for example Linq to Google.

It sure would be cool if ESRI wrote a Linq to Geodatabase provider.

If ESRI doesn’t do something, I suspect many will opt to bypass ArcSDE altogether and access SQL Server directly via Linq to SQL, once it supports spatial types.

I suppose if ESRI did it right it would support a user experience similar to the one in this video, but in C# of course :). I suppose, too, it would involve writing a designer with similar look and feel as ArcGIS diagrammer.

Sourcecode in Geodatabase Prototype

I’ve written a proof of concept editor extension based on the ideas outlined in my previous post, plus some helpful feedback from Brian Flood (thanks!).

The solution, which includes an installer and test file gdb, has been uploaded to arcscripts, right here.

The editor extension maintains a generic List of IExtensions. The extensions in this list are instantiated at OnStartEditing from source code in a table called SourceCode in the edit workspace.

There is also a command provided on a commandbar that allows you to browse and load a source .cs file into the SourceCode table, after verifying that it compiles without errors.

Note that the “using” statements need to include a full path name to the assembly files referenced by the source code.

I’m wondering if this approach might be easier than class extensions as well as easier to maintain. I haven’t tried it, but I suppose it would be possible to have it work with shapefiles as well.

The potential uses of dynamic compilation are intriguing. I’d really like to try this in a IServerObjectExtension. More on that later.

Storing Code in a Geodatabase

It appears that the .NET 2.0 framework will be part of the standard install now for ArcGIS. This means it should be possible to dynamically compile code at run time using CodeDOM.Compiler.

What if we could store code in the geodatabase that would be dynamically compiled at run time?

A major pain for working with ClassExtensions is that people who do not have the DLL installed on their machine are not able to even open featureclasses that have extensions. Perhaps it would be possible to store code in the database (optionally obfuscated) for a classextension.

Perhaps this approach could also be applied to support triggers. In arccatalog, I’d like to be able to right click on a field in a geodatabase table (or featureclass) and provide code that would be called for OnCreateFeature, OnDelete etc. Behind the scenes the geodatabase would store this in a GDB_ table, and when a feature is added to a featureclass (or table) it would JIT compile and call the code. The GDB_ table could also provide a read/write property field (or fields) that would allow me to implement sequences.

I suppose it should be possible to write Visual Studio Addins to support editing code stored in a geodatabase.

I guess there’s really no reason the code would need to be C#, VB.NET or whatever. Maybe a simplified (domain specific) language could be provided that even DBA’s could understand. For example, say I want to assign a county ID to a point whenever a point is created or updated, based on the county the point falls within. The C# code to accomplish this might be a bit overwhelming, so maybe a simpler language could be provided that expresses this. Of course ESRI would need to provide a compiler to create the CIL from the simplified language.

Maybe this approach would also allow custom features. Custom features were promoted at 8.0, but they never really worked as intended. AFAIK ArcFM is the only custom feature based solution in widespread use. What if the CLSID stored in the GDB_Objectclasses table were a key to another table that stored code. Instead of instantiating a COM class when the objectclass is opened, ArcGIS could JIT compile the code stored in the geodatabase. If this is possible perhaps ESRI could provide us with a base Feature class that we could extend and whose methods we could override.

Update: I’ve written a prototype and uploaded here.

Agile Geodatabase Design

iterative waterfall
An iterative waterfall?

While Agile is on my mind, I thought I’d write a bit about Agile Geodatabase design.

Let’s say I follow ESRI’s steps from the bottom of this page to create a geodatabase.

1. With Microsoft Visio or Rational Software Corporation’s Rational Rose, design a geodatabase in UML and export it to an XML Metadata Interchange (XMI) file or Microsoft Repository. To learn how, see http://support.esri.com/geodatabase/uml.

2. Add the Schema wizard to ArcCatalog.

3. Generate a geodatabase schema from the XMI file or Microsoft Repository with the Schema wizard.

4. Once you have generated the schema, you can modify it with tools in ArcCatalog if needed.

5. Once the schema is ready, you can load data into it.

In step 4 let’s say I decide to modify the schema. At that point my schema is out of synch with my CASE model. It can quickly become a pain keeping the model in synch with the geodatabase. Then there is also the pain of keeping Data Access Layer Components (DALCs) in synch with the geodatabase. In essence, these steps represent a waterfall.

So here’s my suggestion: ESRI should provide a Geodatabase Designer within Visual Studio. It would provide a look and feel similar to the VS Class Diagrammer, but it would not replace the Class Designer. Instead, it would provide a graphical way of editing an XMI schema containing geodatabase types. A command would allow it to generate code (.NET classes) the way Dave describes here. Likewise, there would also be a command to synchronize the XMI schema with a geodatabase, as well as with the DALC’s.

I know this is all rather vague, but my point is until we have easy-to-use tools that support the round trips needed in iterative development we’ll end up with waterfall processes. Escher notwithstanding, waterfalls are very un-agile.

Update: Here’s an example of how a designer can be built for Visual Studio.

Neogeography Use Cases, Pretending to be an Architect

More discussion over at High Earth Orbit on neogeography definition.

While I’m sure many are tired of seeing this dead horse beaten, I do find value in discussing a use case often addressed by neogeography: crowdsourcing. As High Earth points out, the neo and paleo geographers would both be actors.

The problem is some of the tools needed to support crowdsourcing are not getting high enough priority by ESRI.

Case in point: ArcGIS Server’s GraphicsLayer.WriteToXml method would make crowdsourcing a lot easier. A Neogeographer draws graphics on the map, adds some attributes and saves it. Behind the scenes it gets saved to disk (via WriteXml, not arcsde via versioning). A Paleogeographer opens ArcEditor, retrieves graphicslayer to map, converts graphics to features, edits it and commits it to the geodatabase.

The only problem with this is a bug in WriteToXml. It was logged in August (NIM011262), but the SP4 doc doesn’t mention it as being fixed.

The slow resolution of this issue might give neogeographers the impression that ESRI doesn’t place high enough priority on crowdsourcing. The ArcGIS architecture needs to support crowdsourcing.


Pretend to Be An Architect

Speaking of architecture, have you ever noticed how so many architects live long and remain creative in their later years? Take a look at Johnson, Wright, and Venturi.

Contrast this with mathematicians, who seem to die too soon, e.g. Boole, Hamilton and Turing.

I think ageism lurks beneath the surface of the paleo/neo discussion. The GIS community is getting gray. A lot of fresh college grads focus on web design instead of cartography. If we can set an example by aging more gracefully maybe they’d be more interested in trying a few old school concepts. Perhaps the key to aging gracefully is to become more like architects and less like mathematicians.

Spatial is Special, what about Time?

pocket knife
The swiss are known for clocks and pocket knives, so why didn’t they include a watch on this pocket knife?

If you’ve worked much with GIS there’s a good chance you’ve had to go through the why-spatial-is-special routine with a DBA wanting to store geometry as numeric columns within normalized tables.

But what about time?

Say you’re using GPS clock to compute location via time difference of arrival (TDOA). Nanosecond precision is needed (speed of light = 1 foot per nanosecond), however, SQL Server doesn’t support anything finer than 3.33 microseconds. This could be overcome by introducing a time column with an ITemporalReference. Internally it would store time as a 64bit integer along with a domain and scale - just like with spatial types. ITime is to IGeometry what ITemporalReference is to ISpatialReference. A simpler (though perhaps more confusing) alternative might be to overload the M (measure) value of geometry to allow time to be stored as a measure.

On the other end of the scale is geologic time, which falls outside the .NET DateTime structure limits. In this case the domain would be much larger.

From the helpdoc:

What is the best way for storing temporal data - a netCDF file or a relational database? Which one is faster?

Storing temporal data in relational database is just as viable as using a netCDF file. ESRI’s support of netCDF is primarily to support the existing community of netCDF data and users, not to force people to learn about a new file format. The decision should be made based on how you want to create and manage data in your organization.

It looks like netCDF addresses this issue. But what if I don’t want to represent time using netCDF or date columns as ESRI suggests?

SqlServer 2008 Spatial is Standard Issue

SpatialDB Advisor has a good article about the confusion surrounding Oracle Spatial licensing. Given that spatial capabilities in SqlServer 2008 will be standard issue, Oracle might decide to change their licensing.

Speaking of licensing, Microsoft Windows Server 2008 will include virtualization. They are also building a $500M datacenter here in San Antonio. I’m hoping this means they will provide something similar to Amazon EC2 for .NET. If this is the case the biggest unknown will be whether ESRI offers a licensing model for 9.3 that lets me run ArcObjects in the cloud.

cloud
Image stolen from Jeremiah, who discusses more about cloud computing here.

Geographic Data type support coming in SqlServer (Katmai)?

I searched but could not find any additional details from this article.

In addition, Katmai will be able to manage different data types including documents, geographic information and XML.

I found this at Microsoft, but not any details.

Next Page »