eCommons

 

Aspects of the P-Norm Model of Information Retrieval: Syntactic Query Generation, Efficiency, and Theoretical Properties

Other Titles

Abstract

A practical information retrieval system must be easy to use by untrained users, and it must provide prompt responses to a user's search requests. In this thesis, these practical aspects of the p-norm model of information retrieval are explored. In addition, a study of theoretical properties of the p-norm model is presented. A syntactic method for generating p-norm queries from parse trees generated by the PLNLP syntactic analyzer is presented. The effectiveness of the syntactically generated queries is shown to be comparable to the effectiveness of manually constructed queries, and much better than that of statistically generated queries. The efficiency of a p-norm retrieval is significantly improved with a new p-norm retrieval algorithm which evaluates the entire document collection in one recursive traversal of the query tree. This algorithm is compared against the straightforward algorithm, which requires a traversal of the query tree for each document that is evaluated. The new algorithm is shown to be better both asymptotically and experimentally. The infinity-one model is introduced as a means of approximating the p-norm model without requiring exponentiation. Experimental results show that infinity-one retrieval is essentially as effective as p-norm retrieval, but much faster. List pruning methods for further efficiency improvements are also introduced and are shown to reduce retrieval time significantly without affecting the precision of top-ranked documents. The retrieval time of the infinity-one model with list pruning is shown to be comparable to that of pure Boolean retrieval. A theoretical study is also presented in which certain Boolean algebra properties, such as associativity, are shown to be unsatisfiable by any extended Boolean system with weak operators. The p-norm model is shown to satisfy all those properties that can be satisfied. In addition, the p-norm model is evaluated with respect to the Waller-Kraft wish list for extended Boolean systems.

Journal / Series

Volume & Issue

Description

Sponsorship

Date Issued

1990-05

Publisher

Cornell University

Keywords

computer science; technical report

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Committee Co-Chair

Committee Member

Degree Discipline

Degree Name

Degree Level

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR90-1128

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Rights URI

Types

technical report

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record