MIRREZAEI-DISSERTATION-2017.pdf (1.22 MB)
Advancing Open Information Extraction Methods to Enrich Knowledge Bases
thesis
posted on 2018-02-08, 00:00 authored by Seyed Iman MirrezaeiDiscovering knowledge from textual sources and subsequently expanding the coverage of knowledge
bases like DBpedia or Google’s Knowledge Graph currently requires either extensive manual work
or carefully designed open information extractors. An open information extractor (OIE) captures triples
from textual resources. Each triple consists of a subject, a predicate/property, and an object. Triples can
be mediated via verbs, nouns, adjectives, or appositions. The research that we conducted in the area
of OIE resulted on the development of OIE systems, named TRIPLEX and TRIPLEX-ST. We focus on
further advancing OIE methods to support the expansion of spatio-temporal information in knowledge
bases.
TRIPLEX extracts triples from grammatical dependency relations involving noun phrases and modifiers
that correspond to adjectives and appositions. TRIPLEX constructs templates that express nounmediated
triples during its automatic bootstrapping process, which finds sentences that express nounmediated
triples by leveraging Wikipedia. The templates express how noun-mediated triples occur in
sentences and include rich linguistic annotations. Finally, the templates can be used to extract triples
from previously unseen text.
TRIPLEX-ST is a novel information extraction system that can capture spatio-temporal information
from text. It extends current open-domain information extraction (OIE) systems in several dimensions,
including the ability to extract facts associated with spatio-temporal contexts (i.e., spatio-temporal information
that constrains the facts). The system usesWikipedia sentences and triples in existing knowledge
bases, such as YAGO, to automatically infer templates during a bootstrapping process. These templates include rich linguistic annotations, and they can be used to extract both facts associated with
spatio-temporal contexts and spatio-temporal facts from previously unseen sentences. TRIPLEX-ST
also includes syntax-based sentence simplification methods, which contribute to improving extraction
effectiveness. Our experiments show that TRIPLEX-ST outperforms a state-of-the-art OIE system on
the extraction of spatio-temporal facts. We also show that our approach can accurately extract useful
new information, in the form of triples connected to spatio-temporal contexts, using a large Wikipedia
dataset.
History
Advisor
F.Cruz, IsabelChair
F.Cruz, IsabelDepartment
Computer ScienceDegree Grantor
University of Illinois at ChicagoDegree Level
- Doctoral
Committee Member
Di Eugenio, Barbara Liu, Bing Ziebart, Brian Martins, BrunoSubmitted date
December 2017Issue date
2017-09-05Usage metrics
Categories
No categories selectedKeywords
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC