Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 2014.
Quantifier scope disambiguation (QSD) is one of the most challenging problems in deep natural language understanding (NLU) systems. The most popular approach for dealing with QSD is to simply leave the semantic representation (scope-) underspecified and to incrementally add constraints to filter out unwanted readings. Scope underspecification has to solve an algorithmic problem: whether a representation with a given set of constraints has any scoping (i.e. whether it is satisfiable), and if so, how to efficiently enumerate possible scopings. The problem is NP-complete in general. It had been an open question whether there exists a tractable set within the context of the most popular constraint-based underspecification frameworks, covering all structures occurring in practice.
I show that the answer to this question is "yes". With no increase in time/space complexity, I extend the previously found tractable subset to include a family of sentences known not to be covered. Moreover, I define a mathematically formalized,
yet linguistically justified, notion of "coherence" and prove that all coherent natural language sentences belong to this subset, and hence prove that the underspecified representation of all coherent sentences can be solved in polynomial-time.
The other way to deal with QSD is to actually resolve the ambiguity using rule-based or statistical methods. There has been a lack of work on statistical QSD, as a result of the lack of extensive annotated corpora. Previous corpora, and hence previous statistical QSD systems, only scope two explicitly quantified (i.e. no definite, indefinite, bare) noun phrases (NPs) per sentence, mainly because even the hand-annotation of full QSD is very challenging.
I propose the first annotation scheme for QSD, addressing many of the challenges that need to be dealt with in hand-annotation. Using this scheme, we have developed the first corpus of English text, annotated with "comprehensive" QSD. In this corpus, a) all NPs in a sentence, regardless of the type of the article, have been scoped; b) the scope of operators such as frequency adverbials and negations have been labeled; and c) distributivity vs. collectivity of plurals has been addressed. Finally, I propose the first comprehensive automatic QSD system, by defining a probabilistic framework for learning to build partial orders. The model has been trained and tested on our corpus. The performance is quite encouraging
and could motivate further work in this area.