Constrained risk-sensitive deep reinforcement learning for eMBB-URLLC joint scheduling

Zhang, Wenheng; Derakhshani, Mahsa; Zheng, Gan; Lambotharan, Sangarapillai

File(s) under permanent embargo

Reason: Publisher requirement. Embargo will be lifted after publication.

Constrained risk-sensitive deep reinforcement learning for eMBB-URLLC joint scheduling

journal contribution

posted on 2024-03-11, 16:24 authored by Wenheng ZhangWenheng Zhang, Mahsa DerakhshaniMahsa Derakhshani, Gan Zheng, Sangarapillai LambotharanSangarapillai Lambotharan

In this work, we employ a constrained risk-sensitive deep reinforcement learning (CRS-DRL) approach for joint scheduling in a dynamic multiplexing scenario involving enhanced mobile broadband (eMBB) and ultra-reliable low-latency communications (URLLC). Our scheduling policy minimizes the adverse impact of URLLC puncturing on eMBB users while satisfying URLLC requirements. Conventional DRL-based algorithms for eMBB/URLLC scheduling prioritize maximizing the expected return. However, for URLLC mission-critical applications, it is crucial to explicitly avoid catastrophic scheduling failures associated with the long tail of the reward distribution. Therefore, robust management of such uncertainties and risks is imperative. Our proposed CRS-DRL algorithm incorporates the conditional Value-at-Risk (CVaR) as the risk criterion for optimization.

A URLLC queuing mechanism is considered to decrease the URLLC drops and increase eMBB throughput compared to the instant scheduling policy. Our architecture is based on the actorcritic model but considers a transfer function to obtain feasible solutions of the unconstrained actor network, and the critic predicts the entire distribution over future returns instead of simply the expectation. Numerical results indicate that our CRSDRL algorithm, under varying CVaR levels, achieves similar expected returns but reduces long-tail behavior for long-term rewards compared to the risk-neutral approach.

Funding

Pervasive Wireless Intelligence Beyond the Generations (PerCom)

Engineering and Physical Sciences Research Council

Find out more...

History

School

Mechanical, Electrical and Manufacturing Engineering
Loughborough University, London

Published in

IEEE Transactions on Wireless Communications

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Version

AM (Accepted Manuscript)

Publisher statement

© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Acceptance date

2024-02-29

ISSN

1536-1276

eISSN

1558-2248

Publisher version

https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=7693

Language

en

Depositor

Wenheng Zhang. Deposit date: 8 March 2024

Usage metrics

Keywords

eMBB deep reinforcement learning punctured scheduling resource allocation risk-sensitive URLLC

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

File(s) under permanent embargo

Constrained risk-sensitive deep reinforcement learning for eMBB-URLLC joint scheduling

Funding

Pervasive Wireless Intelligence Beyond the Generations (PerCom)

History

School

Published in

Publisher

Version

Publisher statement

Acceptance date

ISSN

eISSN

Publisher version

Language

Depositor

Usage metrics

Categories

Keywords

Licence

Exports