A computationally efficient algorithm for learning topical collocation models

Pate, John K.; Zhao, Zhendong; Du, Lan; Börschinger, Benjamin; Ciaramita, Massimiliano; Steedman, Mark; Johnson, Mark

View/Open

efficient_algorithm.pdf (229.9Kb)

Date

2015

Author

Pate, John K.

Zhao, Zhendong

Du, Lan

Börschinger, Benjamin

Ciaramita, Massimiliano

Steedman, Mark

Johnson, Mark

Metadata

Show full item record

Abstract

Most existing topic models make the bag-of-words assumption that words are generated independently, and so ignore poten-tially useful information about word order. Previous attempts to use collocations(short sequences of adjacent words) in topic models have either relied on a pipe-line approach, restricted attention to bi-grams, or resulted in models whose inference does not scale to large corpora. This paper studies how to simultaneously learn both collocations and their topic assignments. We present an efficient reformulation of the Adaptor Grammar-based topi-cal collocation model (AG-colloc) (John-son, 2010), and develop a point-wise sampling algorithm for posterior inference in this new formulation. We further improve the efficiency of the sampling algorithm by exploiting sparsity and parallelising inference. Experimental results derived in text classification, information retrieval and human evaluation tasks across a range of datasets show that this reformulation scales to hundreds of thousands of documents while maintaining the good performance of the AG-colloc model.

URI

http://hdl.handle.net/10477/41265

Collections

Language Sciences (public)