Natural Language Generation for the Semantic Web: Unsupervised template extraction
View/ Open
DanielDumaThesis.pdf (4.293Mb)
Date
28/11/2012Item status
Restricted AccessAuthor
Duma, Daniel
Metadata
Abstract
I propose an architecture for a Natural Language Generation system that automatically learns sentence templates, together with statistical document planning, from parallel RDF data and text. To this end, I design, build and test a proof-of-concept system (“LOD-DEF”) trained on un-annotated text from the Simple English Wikipedia and RDF triples from DBpedia, with the communicative goal of generating short descriptions of entities in an RDF ontology. Inspired by previous work, I implement a baseline triple-to-text generation system and I conduct human evaluation the LOD-DEF system against the baseline and human-generated output. LOD-DEF significantly outperforms the baseline on two of three measures: non-redundancy and structure and coherence.