In this paper we introduce and experimentally assess SemSynX, a novel technique for supporting similarity analysis of XML data via semantic and syntactic heterogeneity/homogeneity detection. Given two XML trees, SemSynX retrieves a list of semantic and syntactic heterogeneity/homogeneity matches of objects (i.e., elements, values, tags, attributes) occurring in certain paths of the trees. A local score that takes into account the path and value similarity is given for each heterogeneity/homogeneity found. A global score that summarizes the number of equal matches as well as the local scores globally is also provided. The proposed technique is highly customizable, and it permits the specification of thresholds for the requested degree of similarity for paths and values as well as for the degree of relevance for path and value matching. It thus makes possible to “adjust” the similarity analysis depending on the nature of the input XML trees. SemSynX has been implemented in terms of a XQuery library, as to enhance interoperability with other XML processing tools. To complete our analytical contributions, a comprehensive experimental assessment and evaluation of SemSynX over several classes of XML documents is provided.
SemSynX: Flexible similarity analysis of XML data via semantic and syntactic heterogeneity/homogeneity detection
Cuzzocrea Alfredo
2016-01-01
Abstract
In this paper we introduce and experimentally assess SemSynX, a novel technique for supporting similarity analysis of XML data via semantic and syntactic heterogeneity/homogeneity detection. Given two XML trees, SemSynX retrieves a list of semantic and syntactic heterogeneity/homogeneity matches of objects (i.e., elements, values, tags, attributes) occurring in certain paths of the trees. A local score that takes into account the path and value similarity is given for each heterogeneity/homogeneity found. A global score that summarizes the number of equal matches as well as the local scores globally is also provided. The proposed technique is highly customizable, and it permits the specification of thresholds for the requested degree of similarity for paths and values as well as for the degree of relevance for path and value matching. It thus makes possible to “adjust” the similarity analysis depending on the nature of the input XML trees. SemSynX has been implemented in terms of a XQuery library, as to enhance interoperability with other XML processing tools. To complete our analytical contributions, a comprehensive experimental assessment and evaluation of SemSynX over several classes of XML documents is provided.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.