Big Processing of Geospatial Data
Geospatial Data has always been Big Data. Now Big Data Analytics for geospatial data is available to allow users to analyze massive volumes of geospatial data. Petabyte archives for remotely sensed geodata were being planned in the 1980s, and growth has met expectations. Add to this the ever increasing volume and reliability of real time sensor observations, the need for high performance, big data analytics for modeling and simulation of geospatially enabled content is greater than ever. In the past, limited access to the processing power that makes high volume or high velocity (collection) of geospatial data useful for many applications has been a bottleneck. Workstations capable of fast geometric processing of vector geodata brought a revolution in GIS. Now big processing through cloud computing and analytics can make greater sense of data and deliver the promised value of imagery and all other types of geospatial information.
Cloud initiatives have accelerated lightweight client access to powerful processing services hosted at remote locations. The recent ESA/ESRIN "Big Data from Space" event addressed challenges posed by policies for dissemination, data search, sharing, transfer, mining, analysis, fusion and visualization. A wide range of topics, scenarios and technical resources were discussed. In addition to the projects discussed at that event, several other big data initiatives have been launched to increase capabilities to processing geospatial data: the European Commission's Big Data Public Private Forum, the US National Science Foundation's Big Data Science & Engineering, and the US Office of Science and Technology Policy's (OSTP) Big Earth Data Initiative (BEDI).
Big processing of big imagery data
Imagery analysis in the cloud has become a reality through web services on publicly available data as well as on proprietary commercial data and secured business and government data. While OGC web services for accessing and visualizeing Landsat data have existed in the lab for nearly a decade, a change in Landsat data licensing recently unleashed this power on the open web. Geoscience Australia's "Unlocking the Landsat Archive" Raster Storage Archive (RSA) makes a petabyte of Landsat surface reflectance values available through a THREDDS server that implements the OGC Web Map Service (WMS) and Web Coverage Service (WCS) interface standards. The European EarthServer project provides access to 100+ terabytes of multi-source, multi-dimensional spatio-temporal data using the open standards of the OGC and the W3C to provide "Big Earth Data Analytics". Digital Globe, Astrium, and ITT Exelis offer cloud service access to commercial imagery using services that implement OGC standards. ESRI and Google offer approaches to imagery processing in the cloud. Over the next few months we can anticipate ever increasing access to and value from the world's remote sensing archives.
The next flood of Big Data comes from mobile devices.
Big Data analytics is an effective way to enhance the power of location (see separate blog topic.). Erek Dyskant's keynote at FOSS4G NA described how geospatial analytics were used as a major tool in the 2012 US Presidential election using open source, open data, and open standards. Matthew Gentile, with Deloitte Financial Advisory Services, describes how analytics go on-location. Stephen Lawler, Bing Maps sees the future focused on 'where' by using algorithmic extraction to relate entities on the Web, organizing them through a semantic taxonomy and enabling natural access. Joe Francica, Directions Magazine, takes Gartner to task for missing that location-based analytics are critical to the understanding of a business’ operations. The opportunities are being met by a variety of analytic geospatial processing methods.
GIS moves to processing in the cloud.
Traditionally, GIS has accessed geospatial content stored in a relational database management system optimized typically for datasets of less than a gigabyte. The performance and upper limits on the data space of GIS have been greatly increased with new technologies that make it possible to harness thousands of data servers and processing servers. Cloud computing is the use of resources (hardware and software) that are delivered as a service over a network. Big data can be handled in "for hire" cloud environments or in corporate or government data centers configured using the same types of virtualization and parallel processing technologies that make cloud processing a hot commodity. Apache Hadoop, an open source framework based on Google’s MapReduce, has been used by several projects to perform geospatial processing in the cloud or in cloud-like enterprise data centers. ESRI's Spatial Framework for Hadoop on GitHub allows developers and data scientists to use the Hadoop data processing system for spatial data analysis, including using geometries that implement the OGC Simple Features standard. Oracle's Big Data Appliance uses traditional RDMS and MapReduce along with OGC standards to mine big spatial data. Norman Barker of Cloudant envisions that “with just a few changes to OGC standards, Cloud services could offer fast, consistent information to location-aware Apps”. In fact, that is happening already.
Accessing the Cloud with OGC Services.
The OGC Web Processing Service (WPS) Interface Standard was used by Terradue to develop a Cloud Service that allows the dynamic deployment of algorithms on a Hadoop framework. Terradue selected the 52° North WPS open source component, as the interface to a Cloud service - thus was born WPS-Hadoop. The University of Pretoria has implemented "Processing as a Service" in the cloud with WPS (PAAS with WPS). Feng Chia University has provided WPS access to a service oriented architecture (SOA) based debris flow monitoring system. OGC access services (WMS, WCS and the Web Feature Service and Sensor Observation Service (SOS) Interface Standards) are also well suited for service interfaces in a geospatial cloud. Multiple presentations at the European Space Agency's (ESA's) Big Data From Space event featured the use of OGC web services. Efforts like these provide experience that guides OGC members as they continue to improve the standards for use in the geospatial cloud.
Transaction oriented analytics: big processing of streaming spatial data.
When dealing with an endless and dynamic flow of space-time data, how can an important change be identified? Where on the surface of the earth should one focus one’s attention in the first place? Jonas and Tucker suggest that we will increasingly rely on analytic sensemaking engines to suggest and direct human attention. One example is the GEOINT Data Analytics Cloud (GDAC), a semiotic compute and storage apparatus built on an open-source cloud technology stack for the storage and processing of diverse data at scale. GDAC includes geospatial and temporal context to provide continuous waves of processing to discern relationships within a data space. Relationships become more apparent with each user interaction, each addition of data or each new algorithm that's introduced in a discovery session. Similarly, Streambase is advancing analytics on streaming sensor data through transaction-based processing and real-time information fusion. OGC standards could be applied to transaction oriented analytics applications like these to provide a solid foundation for location and sensors in this new area of big data and big processing.
As more and more GIS functionality is migrated into the cloud, it is only natural that this technology will move beyond simple search and discovery of data to more advanced geo-processing capabilities. Discussions in the OGC have focused on how we get beyond access and encoding standards: “Moving beyond the interface”. This might involve defining: 1) metadata for evaluating algorithms for fitness for use, quality and provenance; 2) a taxonomy of algorithms as the basis for WPS profiles; 3) methods for portability of spatial algorithms between clouds; or continued development of interfaces for models using WPS and OpenMI.
Two upcoming OGC activities will focus on the geospatial aspects of Big Data Processing. This topic will be addressed at the OGC Technical Committee meeting hosted by the European Space Agency in September. These discussions will take place in the OGC Workflow and WPS working group meetings and in other meetings as well. Also, the recently launched OGC's Testbed 10 will advance cloud technologies and their relation to mobile applications. It's not too late to get involved as a sponsor or technology provider in the Open Mobility thread of OGC Testbed 10.
- More information and references in this Evernote site.
- Overview of this blog series on geospatial trends
- Next Week's topic: Smart Cities
- @Percivall on Twitter