Why GeoProcessing with ArcObjects .NET
This is just to followup on Sean Gillies comments.
why you’d want to put proprietary per-server-licensed software in the mix – when the point of Hadoop is to leverage the combination of commodity hardware and open source – escapes me.
I know a lot of places that maintain large geodatabases. I’m thinking I could write ArcGIS Engine applications that would listen to queues for job requests, and run them. The Engine licenses would only be $500 per seat (not per app). Also, a lot of sites have spare floating licenses they aren’t using at night. Scaling an existing arcobjects based app so that it runs in parallel seems a logical next step to more fully utilize these resources.
One thing that .NET has that java is missing as far as I can tell is System.CodeDOM.Compiler. This would allow a job to include source code that each node would download and run.
I’m using the term geoprocessing here in the general sense – code that processes geodatasets located in a geodatabase without crossing a firewall.
Imagine a website where you send it a job with C# code that you wrote. For example, create me a list of the top 100 properties available for sale anywhere in the US, ranked by a score. Determine the score based on sum of number of 1/miles^2 from nearest starbucks, plus 3/miles^2 from each Home Depot or Lowes (i.e. an inverse distance weighted score). Put the result of this at this URL (an Amazon S3 bucket). The master node would split this up and run it on multiple scoring machines, combine the results and put it into the S3 bucket.
Since we want the top 100, that is a task the master node would need to determine after each scoring node has completed. So the job would include two different code chunks – one for the master, and the other for the scorers.
I can’t imagine anyone would ever take the effort to publish a traditional geoprocessing service that does this. Maybe geoprocessing isn’t the right word, maybe we should call it geocompiling, since we are sending it uncompiled source. Or maybe a domain specific language would be compiled into IL by the master node would make more sense. More later.