Wireless channel selection with restless bandits

Kuhn, J.; Nazarathy, Y.

doi:https://doi.org/10.1007/978-3-319-47766-4_18

Author: J. Kuhn
Y. Nazarathy
Year: 2017
host editors: R.J. Boucherie
N.M. van Dijk
Title: Wireless channel selection with restless bandits
Book title: Markov Decision Processes in Practice
Pages (from-to): 463-485
Number of pages: 23
Publisher: Cham: Springer
ISBN: 9783319477640
ISBN (electronic): 9783319477664
Series: International Series in Operations Research and Management Science, 0884-8289, 248
Document type: Chapter
Faculty: Faculty of Science (FNWI)
Institute: Korteweg-de Vries Institute for Mathematics (KdVI)
Abstract: Wireless devices are often able to communicate on several alternative channels; for example, cellular phones may use several frequency bands and are equipped with base-station communication capability together with WiFi and Bluetooth communication. Automatic decision support systems in such devices need to decide which channels to use at any given time so as to maximize the long-run average throughput. A good decision policy needs to take into account that, due to cost, energy, technical, or performance constraints, the state of a channel is only sensed when it is selected for transmission. Therefore, the greedy strategy of always exploiting those channels assumed to yield the currently highest transmission rate is not necessarily optimal with respect to long-run average throughput. Rather, it may be favourable to give some priority to the exploration of channels of uncertain quality. In this chapter we model such on-line control problems as a special type of Restless Multi-Armed Bandit (RMAB) problem in a partially observable Markov decision process framework. We refer to such models as Reward-Observing Restless Multi-Armed Bandit (RORMAB) problems. These types of optimal control problems were previously considered in the literature in the context of: (i) the Gilbert-Elliot (GE) channels (where channels are modelled as a two state Markov chain), and (ii) Gaussian autoregressive (AR) channels of order 1. A virtue of this chapter is that we unify the presentation of both types of models under the umbrella of our newly defined RORMAB. Further, since RORMAB is a special type of RMAB we also present an account of RMAB problems together with a pedagogical development of the Whittle index which provides an approximately optimal control method. Numerical examples are provided.
URL: go to publisher's site
Other links: Link to publication in Scopus
Language: English
Persistent Identifier: https://hdl.handle.net/11245.1/527a7237-6a2e-40ef-9ded-3e2647207198

Disclaimer/Complaints regulations

If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.