Repository logo
 

Interactive System for Scientific Publication Visualization and Similarity Measurement based on Citation Network

Loading...
Thumbnail Image

Date

2015

Journal Title

Journal ISSN

Volume Title

Publisher

Université d'Ottawa / University of Ottawa

Abstract

Online scientific publications are becoming more and more popular. The number of publications we can access almost instantaneously is rapidly increasing. This makes it more challenging for researchers to pursue a topic, review literature, track research history or follow research trends. Using online resources such as search engines and digital libraries is helpful to find scientific publications, however most of the time the user ends up with an overwhelming amount of linear results to go through. This thesis proposes an alternative system, which takes advantage of citation/reference relations between publications. This demonstrates better insight of the hierarchy distribution of publications around a given topic. We also utilize information visualization techniques to represent the publications as a network. Our system is designed to automatically retrieve publications from Google Scholar and visualize them as a 2-dimensional graph representation using the citation relations. In this, the nodes represent the documents while the links represent the citation/reference relations between them. Our visualization system provides a better view of publications, making it easier to identify the research flow, connect publications, and assess similarities/differences between them. It is an interactive web based system, which allows the users to get more information about any selected publication and calculate a similarity score between two selected publications. Traditionally, similar documents are found using Natural Language Processing (NLP), which compares documents based on matching their contents. In the proposed method, similar documents are found using the citation/reference relations which are iii represented by the relationship that was originally inputted by the authors. We propose a new path based metric for measuring the similarity scores between any pair of publications. This is based on both the number of paths and the length of each path. More paths and shorter lengths increase the similarity score. We compare our similarity score results with another similarity score from Scurtu’s Document Similarity [1] that uses the NLP method. We then use the average of the similarity scores collected from 15 users as a ground truth to validate the efficiency of our method. The results indicate that our Citation Network approach yielded better scores than Scurtu’s approach.

Description

Keywords

Information Visualization, Citation Network, Similarity Measures, Document Similarity

Citation