Archive for the 'ArcGIS Server' Category

Why GeoProcessing with ArcObjects .NET

This is just to followup on Sean Gillies comments.

why you’d want to put proprietary per-server-licensed software in the mix – when the point of Hadoop is to leverage the combination of commodity hardware and open source – escapes me.

I know a lot of places that maintain large geodatabases. I’m thinking I could write ArcGIS Engine applications that would listen to queues for job requests, and run them. The Engine licenses would only be $500 per seat (not per app). Also, a lot of sites have spare floating licenses they aren’t using at night. Scaling an existing arcobjects based app so that it runs in parallel seems a logical next step to more fully utilize these resources.

One thing that .NET has that java is missing as far as I can tell is System.CodeDOM.Compiler. This would allow a job to include source code that each node would download and run.

I’m using the term geoprocessing here in the general sense - code that processes geodatasets located in a geodatabase without crossing a firewall.

Imagine a website where you send it a job with C# code that you wrote. For example, create me a list of the top 100 properties available for sale anywhere in the US, ranked by a score. Determine the score based on sum of number of 1/miles^2 from nearest starbucks, plus 3/miles^2 from each Home Depot or Lowes (i.e. an inverse distance weighted score). Put the result of this at this URL (an Amazon S3 bucket). The master node would split this up and run it on multiple scoring machines, combine the results and put it into the S3 bucket.

Since we want the top 100, that is a task the master node would need to determine after each scoring node has completed. So the job would include two different code chunks - one for the master, and the other for the scorers.

I can’t imagine anyone would ever take the effort to publish a traditional geoprocessing service that does this. Maybe geoprocessing isn’t the right word, maybe we should call it geocompiling, since we are sending it uncompiled source. Or maybe a domain specific language would be compiled into IL by the master node would make more sense. More later.

Parallel GeoProcessing

parallel

A large city here in Texas has done a highly detailed sidewalk inventory. In addition, they’ve created “missing sidewalk” segments representing places where sidewalks could be constructed. In order to prioritize construction they are having me write a program that scores each missing sidewalk segment based on a variety of factors. Many of these factors involve proximity to other things like bus stops, office buildings etc. While I don’t know the name for this, it seems like this is a common pattern for GIS modeling: For each sidewalk, buffer it, search for nearby things and compute a score, basically just an application of Tobler’s 1st law of geography.

The problem is that this is very slow. To make it faster, I’m considering parallelization - divide up the problem across many machines, each scoring sidewalks in a different area of the city. Once each area is complete, merge together the results from each area.

One machine would be the “master”, the rest would be scorers. The master would divide up the city into rectangular areas and put job requests into an Amazon Simple Queue Service queue. Each scorer machines would read this queue, score the area described by the job, and write results the the url prescribed by the job result message in a different queue (also prescribed by the job). The master would read this queue, fetch the results using the url, and append it to the master result.

Since all scorer processes are hitting against the same geodatabase, I suspect the geodatabase would become the bottleneck. Suppose I had an unlimited number of scorer machines at my disposal. Doubling the number of them would not cut execution time in half, but I wonder what the factor would be? What is the optimal area size? Certainly having a tiny area with just a few sidewalks doesn’t make sense, but neither would a huge area. How can we determine optimal size?

There is really no reason the master process needs to be on the same side of the firewall as the scorer processes. This means it should be possible to write a Web Service that allows 3rd parties to submit scoring jobs. For example, realtors could score each available property based on proximity to other features relevant to a particular homebuyer.

What I’m proposing here sure sounds a lot like what Hadoop does. I wonder if we will ever see the day when we can use something like hadoop for geoprocessing with ArcObjects on .NET ?

MapReduce for Large Geodatasets


Here’s an interesting video where Google describes how they use MapReduce to build connectivity in their street data. In ESRI terminology, this how they clean and build topology using parallel processing. They also briefly mention using it to render map tiles.

They don’t go into detail, but apparently those of us outside Google could do this sort of thing using Hadoop on Amazon EC2.

A challenge with tile caches is keeping them up to date with the vectors they depict. Here is how ESRI does it. I think ESRI needs to allow us to scale tile generation across a large number of cpus the way Google does. The licensing model needs to allow this. It seems like opensource Geo software on a paid AMI could be coupled with Hadoop on EC2 to do this.

Once that happens, an agency like a state data center could rebuild tile caches on EC2/S3 nightly from, for example, a statewide vector layer of parcel maps.

I’ve heard rebuilding a geodatabase topology for the nationwide census takes over 24 hours. I bet a MapReduce approach would be much faster for this too.

Neogeography Use Cases, Pretending to be an Architect

More discussion over at High Earth Orbit on neogeography definition.

While I’m sure many are tired of seeing this dead horse beaten, I do find value in discussing a use case often addressed by neogeography: crowdsourcing. As High Earth points out, the neo and paleo geographers would both be actors.

The problem is some of the tools needed to support crowdsourcing are not getting high enough priority by ESRI.

Case in point: ArcGIS Server’s GraphicsLayer.WriteToXml method would make crowdsourcing a lot easier. A Neogeographer draws graphics on the map, adds some attributes and saves it. Behind the scenes it gets saved to disk (via WriteXml, not arcsde via versioning). A Paleogeographer opens ArcEditor, retrieves graphicslayer to map, converts graphics to features, edits it and commits it to the geodatabase.

The only problem with this is a bug in WriteToXml. It was logged in August (NIM011262), but the SP4 doc doesn’t mention it as being fixed.

The slow resolution of this issue might give neogeographers the impression that ESRI doesn’t place high enough priority on crowdsourcing. The ArcGIS architecture needs to support crowdsourcing.


Pretend to Be An Architect

Speaking of architecture, have you ever noticed how so many architects live long and remain creative in their later years? Take a look at Johnson, Wright, and Venturi.

Contrast this with mathematicians, who seem to die too soon, e.g. Boole, Hamilton and Turing.

I think ageism lurks beneath the surface of the paleo/neo discussion. The GIS community is getting gray. A lot of fresh college grads focus on web design instead of cartography. If we can set an example by aging more gracefully maybe they’d be more interested in trying a few old school concepts. Perhaps the key to aging gracefully is to become more like architects and less like mathematicians.

More GIS in the Cloud

clouds
From EnchantedLearning.

Peter Batty is looking into EC2 for his new venture:

… thinking seriously about using Amazon EC2 and S3 when we roll out, especially now that Amazon has added new “extra large” servers with 15 GB of memory, 8 “EC2 Compute Units” (4 virtual cores with 2 EC2 Compute Units each), and 1690 GB of instance storage, based on a 64-bit platform - these servers should work well for serious database processing.

Amazon has details on the new instance types Peter refers to here.

With such large amounts of memory available, it seems possible to build some really killer route finding services.

Microsoft is working on something similar to EC2. I just hope ESRI provides 64-bit, and a license policy that allows cloud deployment when Microsoft comes up with something.

In response to EC2 questions, Microsoft CTO Ray Ozzie said:

Amazon Web Services [are] … showing Web 2.0 startups that there might actually be something there with regard to this utility computing model. Whether it’s the right set of services exactly, or whether the way that they’ve designed them is exactly what matches the needs of those potential developers, there are some questions. But I think they’ve done the industry a service by beginning to open people’s [in other words, Microsoft's] eyes to the potential.

I don’t have any announcements at this point in time. But directionally, I think you could see in my presentation that we believe very heavily in this utility computing fabric concept; it’s the only way, even internally focused, it’s the only way we can get scale amongst all the properties we run internally. And I think it just makes sense to offer those services to developers and to enterprise customers over time.

Sounds like the same business case Bezos made for AWS.

It’s not so much that (Amazon Web Services) has something to do with selling books. It’s the inverse: Selling books has a lot to do with this.

GIS for Citywide Wi-Fi - what about Tracking Server?

metropolis
From Metropolis.

Cnet has an interesting article describing what is needed for successful citywide Wi-Fi deployment.

One of the common threads weaved through each of these [successful] deployments is that all of these cities have committed to using the Wi-Fi networks for their own purposes whether it be to provide remote access for mobile city workers, automate meter reading, control traffic congestion or enhance public safety.

If this is indeed the case, I think (near) real-time GIS mapping applications need more attention. I heard a rumor over a month ago that ESRI has bought Tracking Server from Northrop Grumman. However, I haven’t seen any press releases. I was told there are no plans to include Tracking Server as part of an EDN subscription. I guess this means Tracking Server will stay primarily in the Defense sector. That’s too bad, I think there are a lot of peaceful applications of this technology.

Tiling Tools

This news about a new chip called Tilera is interesting. Strange naming coincidence - it seems the chip could be useful for keeping map cache tiles updated.

I’ve heard that keeping map tile caches updated is a challenge with ArcGIS Server. I see that the GenerateMapServerCache has a thread count property. I wonder how hard it would be to generalize this to spawn subprocesses on different chips.

Or more generally, would it be possible to write a tile cache generator that could run in Amazon EC2, writing tiles to S3?

GIS as a Silo of Babel

MC Escher Tower of Babel

The GIS Dev Cafe is asking where is the community? A community needs a common language.

Lately I’ve felt that GIS is a Silo of Babel quickly crumbling.

God, observing the arrogance of humanity in the construction, resolves to confuse the previously uniform language of humanity, thereby preventing any such future efforts.
Wikipedia on The Tower of Babel

While once we could live comfortably in the silo, we now must build solutions that connect with the rest of the world, requiring us to deal with many languages: SQL, C#, XML, javascript …

I don’t find the slogan “GIS is the language of geography” to be very enlightening. It’s not a language - not even a metalanguage. Please don’t get offended, but GIS can best be described as a religion.

When I encounter a problem with a GIS solution I usually search web sites that focus on that particular language. But when the problem is peculiar to GIS namespaces, the resources are not quite there yet.

Geography is about describing where something is, for example, by using a point. But look there’s several Point class in the new testament (ESRI.ArcGIS.aRCWebService, ESRI.ArcGIS.ADF.ArcGISServer, ESRI.ArcGIS.ADF.Web.Geometry), plus several in the old testament (ESRI.ArcGIS.Geometry, ESRI.ArcGIS.DataSourcesFile.ISMRouterPoint) Juggling between these different namespaces often amounts to an exercise in exegesis.

Right now ArcGIS Server is stuck with yet another chicken vs. egg dilemma. In order for a community to form, there needs to be a language, in order for people to learn a language, there needs to be a community where they can practice conversation.

Maybe a Revival Tent is needed as a third choice (place?) between the Cathedral and the Bazaar. We need to freely share experience, while not necessarily sharing our intellectual property. Perhaps Dave can make such a tent using ArcDeveloper.net.

revival tent

ESRI has forums, but there’s not much activity there. Maybe everyone is being shy? Perhaps one way out of this is to start speaking in tongues. Stop being so orthodox - get out there, roll on the floor, maybe even handle a few snakes to get into the mood. Better yet, help me.

snakes on a plane

Text Elements in an ElementGraphicsLayer

In a previous post I incorrectly implied there is no textelement for elementgraphicslayers.

Now I see that a graphicelement may be assigned a TextMarkerSymbol, whose text may be assigned a string.

Now, if only Map.RefreshResource would let me pass an envelope so the whole layer wouldn’t flash (similar to IActiveView.PartialRefresh) my app would be safe for epileptics.

Thoughts on Editing with ArcGIS Server

The editing tools offered by Google My Maps have raised the bar for user experience. Moving, adding and deleting vertices is so simple.

The user experience of tools in the Editor Task of ArcGIS Server does not compare favorably. Better thin client (javascript) editing tools are needed.

If you don’t like the Editor Task, you’ll need to purchase ArcEditor or 3rd party tools that use the ArcGIS Engine editor extension in order to edit ArcSDE. After purchasing ArcGIS Server Enterprise, upper management may question the need for this.

Even if ESRI provided better javascript though, I still wonder if ArcSDE could hold up on the back end in a crowdsourcing scenario. Perhaps an alternative editing workflow is needed - a Task that supports editing and saving GraphicsLayers in the web tier. The GUI could be like My Maps, but with lots of ADO.NET niceness.

That way after someone in the crowd edits and saves a graphicslayer, a “GIS professional” pulls the saved graphicslayer into an ArcMap (ArcEditor) edit session, cleans it up, blesses it, and loads it into ArcSDE. The tools to do this don’t exist AFAIK, but it doesn’t seem like it would be that difficult to develop them.

Note that either a new graphicslayer is needed, TextElementLayer, or maybe just a new FeatureSymbol based class (TextSymbol?). Even if this isn’t done with the intent to support crowdsourced editing, it’s still needed just to provide some equivalent to the Draw toolbar in Arcmap.

Next Page »