Developing an improved focused crawler for the IDEAL project

Abstract

Description

CS 4624 capstone project. The client is Mohamed Magdy Gharib Farag. Support was provided through NSF IIS - 1319578: Integrated Digital Event Archiving and Library (IDEAL). Files provided have the final report, midterm and final presentations, a poster presented at VTURCS, and related software. Our source code can be found at: https://github.com/wbonnefond/focused-crawler

Keywords

web crawler, IDEAL, Python, natural language processing, tree-edit distance

Citation