English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Paper

An Approach for Weakly-Supervised Deep Information Retrieval

MPS-Authors
/persons/resource/persons101776

Hui,  Kai
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons206666

Yates,  Andrew
Databases and Information Systems, MPI for Informatics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

arXiv:1707.00189.pdf
(Preprint), 632KB

Supplementary Material (public)
There is no public supplementary material available
Citation

MacAvaney, S., Hui, K., & Yates, A. (2017). An Approach for Weakly-Supervised Deep Information Retrieval. Retrieved from http://arxiv.org/abs/1707.00189.


Cite as: https://hdl.handle.net/11858/00-001M-0000-002E-06C5-C
Abstract
Recent developments in neural information retrieval models have been promising, but a problem remains: human relevance judgments are expensive to produce, while neural models require a considerable amount of training data. In an attempt to fill this gap, we present an approach that---given a weak training set of pseudo-queries, documents, relevance information---filters the data to produce effective positive and negative query-document pairs. This allows large corpora to be used as neural IR model training data, while eliminating training examples that do not transfer well to relevance scoring. The filters include unsupervised ranking heuristics and a novel measure of interaction similarity. We evaluate our approach using a news corpus with article headlines acting as pseudo-queries and article content as documents, with implicit relevance between an article's headline and its content. By using our approach to train state-of-the-art neural IR models and comparing to established baselines, we find that training data generated by our approach can lead to good results on a benchmark test collection.