Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State Space - Laboratoire d'Informatique de Grenoble Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State Space

Résumé

In this paper, we revisit the regret of undiscounted reinforcement learning in MDPs with a birth and death structure. Specifically, we consider a controlled queue with impatient jobs and the main objective is to optimize a trade-off between energy consumption and user-perceived performance. Within this setting, the diameter D of the MDP is Ω(S S), where S is the number of states. Therefore, the existing lower and upper bounds on the regret at time T , of order O(√ DSAT) for MDPs with S states and A actions, may suggest that reinforcement learning is inefficient here. In our main result however, we exploit the structure of our MDPs to show that the regret of a slightly-tweaked version of the classical learning algorithm UCRL2 is in fact upper bounded by Õ(√ E 2 AT) where E 2 is related to the weighted second moment of the stationary measure of a reference policy. Importantly, E 2 is bounded independently of S. Thus, our bound is asymptotically independent of the number of states and of the diameter. This result is based on a careful study of the number of visits performed by the learning algorithm to the states of the MDP, which is highly non-uniform.
Fichier principal
Vignette du fichier
NoDiam (1).pdf (353.92 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03799394 , version 1 (05-10-2022)
hal-03799394 , version 2 (18-11-2022)
hal-03799394 , version 3 (20-02-2023)

Identifiants

  • HAL Id : hal-03799394 , version 2

Citer

Jonatha Anselmi, Bruno Gaujal, Louis-Sébastien Rebuffi. Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State Space. NeurIPS 2022 - 36th Conference on Neural Information Processing Systems, Nov 2022, La Nouvelle Orléans, United States. ⟨hal-03799394v2⟩
214 Consultations
178 Téléchargements

Partager

Gmail Facebook X LinkedIn More