Interpretation of tables in texts
View/ Open
Date
2000Author
Hurst, Matthew Francis
Metadata
Abstract
This thesis looks at the issues relating to the development of technology capable of
processing tables as they appear in textual documents so that their contents may
be accessed and further interpreted by standard information extraction and natural
language processing systems. The thesis offers a formal description of the table and
the description and evaluation of a system which provides instances of that model
for table examples.
There are three parts to the thesis. The first looks at tables in general terms,
suggests where their complexities are to be found, and reviews the literature dealing
with research into tables in other fields. The second part introduces a layered
model of the table and provides some notational equipment for encoding tables in
these component layers. The final part discusses the design, implementation and
evaluation of a system which produces an instance of the model for the tables found
in a document. It also discusses the design and collection of a corpus of tables used
for the training and evaluation of the system. The thesis catalogues a laxge number
of phenomena discovered in the corpus collected during the research and provides
appropriate terminology.