The Automatic Grouping of Sensor Data Layers Using Semantic Clustering and Classification to Group Semantically Similar Sensor Data Layers

Date
2013-01-25
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The Sensor Web is a growing phenomenon where an increasing number of sensors are collecting data in the physical world, to be made available over the Internet. Open standards have been proposed and are being implemented to eliminate the problem of semantic interoperability, the goal being to allow systems to share data automatically. Spatial Data Infrastructures (SDIs) are tools that have been developed to manage geospatial data from many different sources. However, there are still problems with interoperability associated with a lack of standardized naming, even with data collected using the same open standard. The objective of this thesis is to automatically group similar sensor data layers. We propose a methodology to automatically group similar sensor data layers based on the phenomenon they measure. Our methodology is based on a unique bottom up approach that uses text processing, approximate string matching, and semantic string matching of data layers. Text processing includes normalization and tokenization to standardize syntactic differences in the naming. Approximate string matching techniques include Levenshtein Distance, a Length Adjusted Levenshtein Dissimilarity, Jaro Dissimilarity, JaroWinkler Dissimilarity, Jaccard Dissimilarity, and Cosine Dissimilarity. For semantic string matching, we use WordNet as a lexical database to compute word pair similarities and derive a set-based dissimilarity function using those similarity scores. These string matching algorithms are used to produce dissimilarity values between data layers, which are in turn used to provide data layer to data layer mappings, similar data layer clusters, and mapping between a set of class names and data layers. For clustering, we tested three different clustering algorithms, K-Medoids, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and Hierarchical Agglomerative Clustering (HAC). We evaluate and discuss the results of our methodology, and introduce a proof of concept Virtual SOS service to show the utility of such research.
Description
Keywords
Geotechnology
Citation
Knoechel, B. C. (2013). The Automatic Grouping of Sensor Data Layers Using Semantic Clustering and Classification to Group Semantically Similar Sensor Data Layers (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. doi:10.11575/PRISM/28017