Developing an improved focused crawler for the IDEAL project
TR Number
Date
2014-05-09
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Description
CS 4624 capstone project. The client is Mohamed Magdy Gharib Farag.
Support was provided through NSF IIS - 1319578: Integrated Digital Event Archiving and Library (IDEAL).
Files provided have the final report, midterm and final presentations, a poster presented at VTURCS, and related software.
Our source code can be found at: https://github.com/wbonnefond/focused-crawler
Keywords
web crawler, IDEAL, Python, natural language processing, tree-edit distance