MapReduce for Large Geodatasets
Here’s an interesting video where Google describes how they use MapReduce to build connectivity in their street data. In ESRI terminology, this how they clean and build topology using parallel processing. They also briefly mention using it to render map tiles.
They don’t go into detail, but apparently those of us outside Google could do this sort of thing using Hadoop on Amazon EC2.
A challenge with tile caches is keeping them up to date with the vectors they depict. Here is how ESRI does it. I think ESRI needs to allow us to scale tile generation across a large number of cpus the way Google does. The licensing model needs to allow this. It seems like opensource Geo software on a paid AMI could be coupled with Hadoop on EC2 to do this.
Once that happens, an agency like a state data center could rebuild tile caches on EC2/S3 nightly from, for example, a statewide vector layer of parcel maps.
I’ve heard rebuilding a geodatabase topology for the nationwide census takes over 24 hours. I bet a MapReduce approach would be much faster for this too.