Optimal and approximate Q-value functions for decentralized POMDPs

[en] Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q* is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q*. In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models (Dec-POMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Q-value functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Q-value function Q*. Finally, unifying some previous approaches for solving Dec-POMDPs, we describe a family of algorithms for extracting policies from such Q-value functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem.

Disciplines :

Computer science

Identifiers :

UNILU:UL-ARTICLE-2011-703

Author, co-author :

Oliehoek, Frans A.

Spaan, Matthijs T. J.

Vlassis, Nikos ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB)

Language :

English

Title :

Optimal and approximate Q-value functions for decentralized POMDPs

Publication date :

2008

Journal title :

Journal of Artificial Intelligence Research

ISSN :

1943-5037

Publisher :

Morgan Kaufmann Publishers, San Francisco, United States - California

Volume :

Pages :

289-353

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

http://www.jair.org/media/2447/live-2447-3856-jair.pdf

Commentary :

MARKOV DECISION-PROCESSES COMPLEXITY SYSTEMS

Available on ORBilu :

since 17 November 2013

Statistics

Number of views

26 (0 by Unilu)

Number of downloads

144 (0 by Unilu)

More statistics

Scopus citations^®

288

Scopus citations^®
without self-citations

256

OpenCitations

WoS citations^™

179