The Cross-Community Interoperability (CCI) thread seeks to build on interoperability within communities sharing geospatial data and advance semantic mediation approaches for data discovery, access and use of heterogeneous data models and heterogeneous metadata models. This thread will explore the creation of domain ontologies and tools to create, assemble, and disseminate geographic data provided voluntarily by individuals. In addition to build integration across all OGC web services with the intent to provide a better understanding of service content and the relationships or associations that exist between OGC services and resources/content. The work to be performed in this thread includes the following:
Currently there is no standard data model within communities of interest. For example amongst fire and rescue different local, federal or international agencies use different symbols and terminology for the same event, location or building. This makes the sharing of data difficult. The participant shall identify the standard model for the Emergency and Disaster Management community and provide a component (preferably a web service) that will translate between different local, federal or international sources. The current suggestion is utilizing the homeland security symbology found at: www.fgdc.gov/HSWG/index.html as the basis for the standard model. This needs to be evaluated, and other recommendation noted in the ER along with missing items and or issues with using this as the overall standard model. Other areas to explore are if there are defined subsets of feature classes, what types of subsets should be defined and what features relate to those subsets. Finally the participant shall propose a mapping of features to the symbol in the Homeland Security Symbology.
In terms of harmonizing vocabulary / ontology support the participant shall prioritize the effort at the national level and then international as follows:
1) Police – Fire – Ambulance
2) Emergency Services – Military (for joint ops / events)
3) National Police – Interpol – other international police organizations
4) National Emergency Services – UN, Disaster Relief organizations
The participants shall create a component, either a tool or service that can map between a dataset and the selected standard model. This tool should be able to allow for the data exchange between different communities of practice, for example between NGA model and the DGIWG.
The participants shall report all findings in a Ontology Engineering report which should include potentially missing symbols, events that are un-mappable, and optimization ideas.
An example of possible output of this activity is a tool that would take Source A and Source B inputs and translate them to the Common Standard as depicted below:
Volunteered geographic information (VGI) is the harnessing of tools to create, assemble, and disseminate geographic data provided voluntarily by individuals. Some examples of this phenomenon are WikiMapia, OpenStreetMap, and Google Map Maker. These sites provide general base map information and allow users to create their own content by marking locations where various events occurred or certain features exist, but aren’t already shown on the base map. VGI is a special case of the larger Web phenomenon known as user-generated content.
The ability to render, share and exploit VGI data in a meaningful way needs to be further explored. The participant shall demonstrate expanded use of OGC services and standards for VGI access, data linking, and rule-based conflation to include:
a. Link VGI from Twitter or other sources to point features, POI, and gazetteer data (must expand use of VGI beyond Twitter)
b. Link other unstructured open-source/crowd-source information to existing point features, POI, and gazetteer data
c. Address cross-domain semantic mediation to support feature and attribute matching for conflation and data linkage
d. Semantic mediation should expand beyond feature class and name to enable rule-based conflation of selective attributes. For example: status, construction material, height, relation to other features, other identifying designators, etc
During OWS-9, the basic components for a standards-based system for querying multiple gazetteers from a single query were demonstrated under the SPEGG. Participants demonstrated access to two gazetteers, with non-overlapping geographic coverage, using a single query with Web Feature Services – Gazetteer (WFS-G) best practices, cascading web servers, and semantic mediation.
This SPEGG research forms the foundation for the Virtual Global Gazetteer Client and Enhancements task of the current effort, which will focus on developing a Virtual Global Gazetteer that includes a gazetteer-specific client, develops advanced fault-tolerant capabilities, and opens the service to the wider community for comment.
The second task, Gazetteer Conflation addresses the issue of gazetteers with overlapping geographic coverage. The assumption for this Web Processing Service task will be that gazetteer features that represent the same geographic feature may differ in location and spelling. This standards-based task builds on the OWS-9 conflation research, but focuses on gazetteer-based point conflation. This task will require semantic linking to filter the features and conflation to match the features and transfer attribute and positional information. The output from this process will either be a map display or a concordance (link table) of matched features.
Gazetteer Linking will build on the semantic web, RDF standards, SPARQL, and concordances to rapidly move across gazetteers and other crowdsourced data, giving researchers access to information beyond the limited descriptions in gazetteers. This task will involve encoding gazetteer information in RDF and linking information from other sources. A concordance or data internal to the services or databases will be used to walk across information sources, so no conflation or geometric/attribute matching is required. With this approach, a user of an NGA gazetteer will be able to link to the Geonames.org gazetteer and move from there to other information, such as weather, climate, population, and government descriptions.
The standards-based work in this thread will enhance the capabilities of current gazetteers, further interoperability among gazetteers, and demonstrate the potential of conflation and linking gazetteers as key elements of the semantic web.
The three research tasks outlined in this thread are all standards-based efforts intended to implement, test, and evaluate and extend the state-of-the-art for geographic names analysis and management.
The first task will focus on the development of a Virtual Global Gazetteer capability that tests the suitability of a standards-based approach for addressing geographic names customer requirements. During OWS-9, a generic client for handling names requests served ably to demonstrate the basic functionality of the overall architecture.
The first task in OGC Testbed 10 will develop a customized gazetteer client so that users can generate queries and access services without understanding the underlying technologies and receive responses that are similar to those from non-standards-based systems. The underlying architecture will be similar to the Single Point of Entry Global Gazetteer using the NGA and USGS gazetteers, but contain enhancements identified in OWS-9.
OGC Testbed 10 should demonstrate a client that allows a user to formulate a query that includes the name, a name string filter for the name, a feature description, country, and a spatial constraint. The results that are returned should be in tabular form, with the ability to search for additional results if a subset of the search results is returned by the query.
The user should be able to query by name, feature description, country, and spatial constraint.
The user should be able to enter a name, including diacritics if appropriate.
The user should be able to select how the name is utilized in the query. Filters may include:
· Starts With
· Ends with
· Fuzzy Match
The user should be able to filter queries by the feature description (also known as the feature designation). This description could reflect the terminology of either agency, i.e., the user should be able to query on USGS feature descriptions or NGA feature designations. These should be expanded to common language descriptions rather than codes. As an example, a user should be able to select USGS feature descriptions and pick a term like ‘summit.’ This term would access information from related NGA feature classes, such as ‘mountain’, ‘hill’, ‘peak’, ‘rock’, etc. The user should also be able to filter features based on the use of NGA terms. In this case, picking a term like ‘hill’ would return USGS ‘summit’ features. These mappings should be displayed to the user on the query form.
The user should be able to filter queries by country, i.e., show only names in Afghanistan. Country names should be expanded to common language descriptions rather than codes.
The user should be able to filter the query using a bounding box, radial search, or near query that will sort the results from closest to furthest away from a given coordinate.
Results should be returned in a tabular format. If the number of records returned is fewer than the total number of records identified by the query, then the user will have the option of seeing the total number of results and paging through the results until all the records are displayed. (See Figure 3-3) Figure 3-3, like all figures in this Statement of Work, is provided for illustration purposes only and the exact content and design may vary. Any processing errors should be identified as well if the results of the query are incomplete due to lack of WFS-G support by any of the servers.
The user shall have the ability to sort the results by name or feature description. In addition, the user shall have the option of sorting the records from nearest to furthest away, based on a near spatial query.
Based on the feedback from OWS-9, the basic architecture underlying the Virtual Global Gazetteer shall be improved and where necessary developed.
Because of limitations between the versions of the WFS-G servers, a ‘thick mediator’ approach may be most practical for the underlying architecture. This will allow differences in capabilities and versions to be addressed by the mediator/cascading service. The research should evaluate the complexity of this approach versus the costs and effort to configure multiple WFS-G services identically.
Due to lack of capabilities and potential problems in service, the addition of fault tolerant functionality needs to be addressed to assure consistent service and provide the user with an understanding of the results returned.
The Virtual Global Gazetteer should be designed to access and deliver results from the complete NGA and USGS servers (or simulated servers). Performance for the system should be recorded to insure that results are timely. Note: performance is not normally a part of OGC testbeds, but is needed to determine the viability of the solution.
Gazetteer conflation is the process of matching entries from multiple names sources, sharing or replacing attribute information, and presenting the fused results to users. This task is becoming more important with the proliferation of international, national, state, and crowdsourced gazetteers. This matching process enables a gazetteer producer to identify common features across sources as well as update and enhance existing sources. Gazetteer Conflation uses point-to-point conflation of data sets with limited attribution … basically a name and feature description.
Matching will occur between a source service that contains the original names and a target service that contains the names to be matched to the source. Typically, matches are based on proximity, closeness of name spelling, and agreement of feature designation. The conflation process needs to be sensitive to the nuances of name matching, including the ability to match names which may 1) be spelled slightly differently (Minch Harbor versus Minche Harbor), 2) be spelled using different word orders (Lake Utopia versus Utopia Lake), 3) incorporate abbreviations (Saint George and St. George), 4) incorporate numbers which are indicated by numerals or spelled out (Dike 1 versus Dike One), 5) include special symbols as word replacements (Dike # 1 versus Dike Number 1 or Chesapeake & Ohio Canal versus Chesapeake and Ohio Canal), and 6) be missing the generic portion of the name (Minch Harbor versus Minch) . These differences may occur singly or in combination.
Conflation algorithms also need to be able to match names using variant names as well as official names. For example, if the name Peking is found in a source service and associated with the name Beijing, the conflation algorithm should recognize this relationship and match the name Peking from a target service with Beijing in the source service.
Matching feature designations requires semantic matching as differing gazetteers may use differing terminology. This step is essential for reducing the search space of candidate names. For example, the NGA feature description, Populated Place, is equivalent to the Canadian Geographical Names Database (CGND) of Town, Hamlet, or Unincorporated Area.
Gazetteer matches are typically scored based on measures of proximity and closeness of spelling. A name that is spelled the same and located at the same position would get a perfect score. Scores would be reduced as differences in position and spelling increase. Users would specify a threshold at which two point features are considered to match and the highest scoring feature from the target database exceeding the threshold would be selected as the matched feature.
The goal of this task is to demonstrate a standards-based approach to gazetteer conflation using Web Feature Servers (WFS), Web Processing Standards (WPS), and Web Feature Services Transactional (WFS-T). It will address two specific use cases: 1) Automated Gazetteer Conflation where one source is known to contain better information than a second source, and 2) Transactional Gazetteer Conflation based on user interaction.
In the first use case, the goal is to take a WFS-G gazetteer service (referred to as A) and match it with a data from another service (WFS-G or WFS) (referred to as B) that contains more accurate and/or more current information, displaying the conflated results. The assumption is that the information in A is inferior to the data in B and will be replaced by the information from B in cases where a match is found.
In Figure 1-4, NGA source names are shown in black and New Brunswick target provincial names are shown in red. This illustrates a typical situation, where a locally produced gazetteer contains information that differs from the NGA gazetteer in terms of the positional accuracy of names, as well as the current local spelling. For the purposes of this task, it is assumed that the information in the New Brunswick gazetteer is more accurate than the NGA gazetteer in every respect and will replace the NGA data when matched.
In Figure 1-5, the named features from NGA are shown in black and the named features from New Brunswick are shown in red. The locations of the identical features are connected using red lines, highlighting the positional differences in the two databases.
The conflation results should be displayed on a map. The user should be able to select the type of map display, either 1) an updated version showing the updated source service names or 2) an updated version showing all the names from the source and target services.
The updated NGA data is shown in Figure 1-6. In this case, the names are symbolized to reflect the changes as a result of conflation. Names in black retain the original NGA spelling. Names in red adopt the New Brunswick spelling. Names in italics represent features whose positions have changed from the NGA source. Names not in italics (none in this example) would represent names where the original NGA positional information is retained. The specific changes need to be noted in the map display, although the specific graphic design shown in Figure 1-7 is not required.
Figure 1‑7 Map Display Showing Updated Source Data and Other Target Data
The updated NGA data from Figure 1-6 is combined with the other New Brunswick names data in Figure 1-7. This would form a layer containing all the names information from both sources, symbolized to indicate how the source was modified.
In addition to a map display, the automated conflation process should produce a match table, showing the feature identifier from the source, the feature identifier from the target, and highest match score if there is a match.
Automated Gazetteer Conflation is optimistic, assuming that one data service is superior to another in every respect and can be used to replace the information without inspection. A more realistic scenario, Transactional Gazetteer Conflation, evaluates names one at a time, puts an analyst in the loop, and lets the analyst determine which positional and attribute information is transferred from the target service to the source service. In this use case, the analyst extracts a series of records for conflation and then steps through the set of records.
The Interactive Gazetteer Conflation workflow uses the following steps:
The Match Display should display a map showing the location of the source name and the candidate matches. The source name and information should be displayed, along with a table of candidate matches listing at a minimum the name, distance from the original feature, and spelling score match. The user should be able to select a match record and update the position or name fields and commit the record to the database using WFS-T. Once a record is committed, it should be marked as a ‘Match’. If no record matches, the user should mark the name as having ‘No Match’. Any record which is not committed or matched will have a status of ‘Not Evaluated.’
The Task Console shows the overall progress of the matching process for the selected source data. It will provide an overview of the progress for the selected data being matched, as well as the status of the individual records.
The analyst should be able to export the results of the conflation process to create a concordance table containing the match information and linkages between the sources. This table can be used in the Gazetteer Linking process.
The Government will provide copies of the following databases for testing:
USACE will provide geographic names support for oversight and testing of the proposed solutions.
In OGC Testbed 10, the CCI thread will utilize a Web Processing Service (WPS) to advance data and service discovery and investigate, evaluate, and demonstrate through OGC new and/or existing services the benefit of semantic mediation approaches to support discovery of pertinent services or data collections providing semantically equivalent or relevant data.
The participants shall begin with the architecture results from OWS-9 and evolve them to:
The participants shall advance the management of data provenance in OGC Web Services by properly capturing and propagating that information through OGC services:
The participant shall implement more complex conflation cases:
- Demonstrate cross-domain conflation (e.g. names (gazetteer entries), POI, points or aeronautical, topographic and maritime features)
· Update attributes from one domain to the other
- Demonstrate conflation of multiple datasets (3 or more)
- Demonstrate conflation or linking of points, lines, and polygons.
Mature and extend the attribute matching concepts of OWS-9:
- Current mediation is primarily used for feature class and feature name only
- Apply semantic mediation to the feature matching process to determine matches based on descriptive feature class, name, and attribute similarity.
- Apply semantic mediation to additional attributes as part of decision rules to conflate or link data
- Apply semantic mediation to additional attributes as part of conflation business rules to enrich target attribution
- Use the best available open source or participant provided conflation tools in the WPS, which is not necessarily continuation from OWS-9 tools, other options may be explored.
The participants shall conduct an engineering study to investigate alternatives and provide provenance strategy, the study should include the following:
The participants shall implement an expanded provenance processes to include:
All Web Service Implementations developed in response to the scenario shall be based on the following:
Hydrographic and hydrologic data have been collected by numerous agencies according to specific conceptual models to fulfill specific needs. For example, Canada and the United States have each a specific conceptual model, the National Hydrographic Network and the National Hydrographic Data (NHD), respectively. Even if efforts were done to spatially harmonize the data, the semantic, quality and data structure are still distinct. As a result, the access, query, and use of multiple hydro data sources together are still an issue.
The World Meteorological Organization has developed and proposed to OGC a hydro model that is called the HY_Features model. This Meta model addresses most of the requirements that represent cross domain hydro concepts. However, there is a need to demonstrate how it should be used has part of hydro data infrastructure to support interoperability among such data and services.
Consequently, this part of the CCI thread aims to answer questions like:
Based on the Canada NHN, the US NHD, and other specific hydro models available, along with the HY_Features model, the participant shall develop a mediation service on the Web that can accept query according to the specific conceptual models and the HY_Features model and returns results according to one of the conceptual model that would be identified in the query. The results can be a GML file, a file in the format supported by the specific source, a WFS, etc.
The test bed shall also demonstrate the use of GeoSparql to query hydro data and to provide results that support the Semantic Web and Linked Data.
OGC Testbed 10 Hydro modeling will provide support services to demonstrate the use of linked data to provide feature identity and representation linking services for a set of societally-defined features. Linked representations of the features and the features’ attributes should include but not be limited to Web Feature Services and Sensor Observations Services. Linked data types and relationships should be based on appropriate domain information models such as HY_Features, CityGML, etc.
In addition to the service, the participant shall make recommendations for the development of new hydro models and identify strengths and weaknesses of hydro models used during the test bed. This shall be documented in the ER along with any other findings identified during the test bed. Also, best practices shall be elaborated to support agencies that aim to interoperate with others with respect to their specific hydro data in conjunction with the HY_Features model.
Table 1‑1 CCI Thread Deliverables Summary
|1. Profile Interoperability Engineering Report|
|2. CCI Change Requests – to OGC standards (as needed)|
|3. Virtual Global Gazetteer Client|
|4. Virtual Global Gazetteer Service|
|5. NGA WFS-G|
|6. USGS WFS-G|
|7. Local WFS|
|8. WFS for VGI|
|9. Virtual Global Gazetteer Engineering Report|
|810 CCI OGC Web Services|
|11. Provenance Engineering Report|
|12. CCI OGC Client Applications|
|13. CCI WPS 2.0 Conflation Service|
|14. VGI Component|
|15. Semantic Mediation Service|
|16. Ontology Engineering Report|
|17. Ontology Mapping Component|
|18. VGI Engineering Report|
|19. Hydro Engineering Report|
|20. Hydro Mediation Service|
|21. Hydro Web Services|
|22. WPS Profiles Engineering Report (Added Aug 15, 2013)|
This Engineering Report will include description of the work performed and analysis of DGIWG and NSG profiles for interoperability. Highlight any issues which may be cause for interoperability concerns with standard OGC services or between DGIWG and NSG profiles.
Change Requests to OGC specifications as required. Modifications or enhancements to the OGC suite of standards as needed to support the concept and implementation of Semantic Mediation capabilities.
A client as described in detail in section 1.2.3.
This component is a thick mediator that will be able to cascade different gazetteers as described in detail in section 1.2.3
WFS-G service for the GNS or GeoNET Names Server that provides access to the Geographic Names Data Base (GNDB). It serves names for areas outside the United States and its dependent areas, as well as names for undersea features, as described in detail in section 1.2.3.
The GNIS or Geographic Names Information System, managed by USGS. It contains information about domestic and Antarctic names as described in detail in section 3.2.2.
Other gazetteers from any part of the world that might be available via WFS or WFS-G (e.g. New Brunswick gazetteer) for use in the conflation work as mentioned in section 1.2.3.
A WFS Service that provides Point of Interest data from Volunteered Geographic Information (VGI) sources, as for example Open Street Map for use in the conflation work as mentioned in section 4.2.2.
This Engineering Report shall be written in compliance with OGC standards and include optimization ideas, Service change recommendations, and lessons learned.
Enhancements to the WFS, CSW, Semantic Mediator, SPARQL server, as needed to test the architecture. These are described in more detail in the CCI Requirements Section 1.2.5. WMS/WFS/WCS should follow the DGIWG profiles as defined in Section 3.2.6
This Engineering Report shall be written in compliance with OGC standards and include optimization ideas, Service change recommendations, and lessons learned.
A WPS or WPSs that satisfy the requirements as described in section 1.2.4
A VGI software component as described in section 1.2.2.
A WPS that satisfies the requirements as described in section 1.2.4.
The report summaries findings about the encoding of ontologies and models in the Ontology Engineering report. It should include optimization ideas an potential issues, such as missing symbols, events that are un-mappable,.
A component that can map between a dataset and the selected standard model. This component should be able to allow for the data exchange between different communities of practice, for example between NGA model and the DGIWG.
The report summarizes findings about the advancement s using VGI resources. The report shall include optimization ideas, service change recommendations, and lessons learned.
The report includes recommendations for the development of new hydro models and identifies strengths and weaknesses of hydro models used during the test bed. This report shall also capture best practices to support agencies that aim to interoperate with others with respect to their specific hydro data in conjunction with the HY_Features model.
A mediation service on the Web that can accept query according to the specific conceptual models and the HY_Features model and returns results according to one of the conceptual model that would be identified in the query.
A service enabling Gauging station selection that triggers the calculation and returns the upstream network across US-Can border as a GML file following the NHD or NHN model.
to include analysis of WPS 2.0 core ability to support multiple profiles; clear definition of opportunities for commonality between OGC Testbed 10 WPS Profiels; and description of unique requirements to support conflation as described in the RFQ.
An ontology represents knowledge as a set of concepts within a domain, and the relationships between pairs of concepts which can be used to model a domain and support reasoning about entities. An ontology renders shared vocabulary and taxonomy which models a domain with the definition of objects/concepts, as well as their properties and relations. The creation of domain ontologies is fundamental to the definition and use of an enterprise architecture framework.
The objective in OGC Testbed 10 for the VGI is to continue work integration of the emerging flow of less structure data provided by citizens that are becoming more and more important, and that requires to be integrated with structured data sources. OWS-9 developed a process to geocode the twitter feeds. The process is available as a service over the web and uses gazetteer on the backend to resolve place names. These services over the web, are also based on WPS.
Example of crowd-sourced data includes:
The objective in OGC Testbed 10 for the Gazetteer is to continue work on a Single Point of Entry (SPEGG), as defined in the OWS-9 RFQ Annex B.
The objective in OGC Testbed 10 for the WPS Conflation and Provenance is to support additional data and service discovery.
The Viewpoint of Hydro Modeling is to demonstrate how it should be used has part of hydro data infrastructure to support interoperability among the following data and services:
Initial models and data sources for the test bed:
· US services description
The Information Viewpoint considers the information models and encodings that will make up the content of the services and exchanges to be extended or developed to support this thread.
Many items within the OGCC Testbed 10 CCI thread are enhancing work performed in the OWS-9 CCI Thread. The following ERs will be helpful to understand how the technology has been enhanced.
Below is a listing of the specifications and how they are utilized within OGC Testbed 10 CCI thread. More information, links and details refer to the annotated bibliography.
The following Geospatial related ISO specifications are included for general informational purposes, as all implementation must be ISO standard compliant:
The computational viewpoint is concerned with the functional decomposition of this thread architecture into a set of services that interact at interfaces. It reflects the components, interfaces, interactions and constraints of the service architecture without regard to their distribution. For more information about each of the elements listed here, please refer to the online annotated bibliography of this RFQ/CFP.
The Web Feature Service and Filter Encodings will be implemented while satisfying the Gazetteer requirements.
WFS 2.0 and FES 1.1 will be used to serve and query data from the different data models.
The Web Feature Service and Filter Encodings will be implemented while satisfying the Gazetteer requirements.
The Web Processing Service (WPS) will be used to advance data and service discovery and investigate, evaluate, and demonstrate a WFS.
OpenSearch will be potentially used to find VGI data to aid in satisfying those requirements.
OWS Context will be utilized in the Linked WPS section to provide means to capture linkages and relationships between different datasets and objects.
The WMS specification will be utilized while working with profile interoperability. In addition the Hydro Model capability will be demonstrated utilizing a WMS.
The GeoSPARQL specification will be utilized throughout the CCI thread to perform Gazetteer, WPS linkage and Hydro modeling implementations.
The Enterprise, Information, and Computation viewpoints describe a system in terms of its purposes, its content, and its functions. The Engineering viewpoint identifies component types in order to support distributed interaction between the components of the system. Those components interact based upon the services identified and described in the Computational viewpoint. Described below in the Engineering viewpoint is a basic notional scenario to understand how the components would interoperate in normal anticipated usage.
The following scenario is an initial engineering viewpoint and will be refined and modified as needed during the implementation stages by the CCI and Mobility thread participants. It is provide here for perspective and context of the thread requirements and mission.
Any of the databases and WPS/WCPS processes above should make use of cloud integration from Open Mobility Thread.
Nuclear dirty bomb takes out an oil platform off of Monterey.