Dennis J.N.J. Soemers
Mella Vegard
Piette, Eric
[UCL]
Matthew Stephenson
Cameron Browne
Olivier Teytaud
Transferring trained policies and value functions from one task to another, such as one game to another with a different board size, board shape, or more substantial rule changes, is a challenging problem. Popular benchmarks for reinforcement learning (RL), such as Atari games and ProcGen, have limited variety, especially in terms of action spaces. Due to a focus on such benchmarks, the development of transfer methods that can also handle changes in action spaces has received relatively little attention. Furthermore, we argue that progress towards more general methods should include benchmarks where new problem instances can be described by domain experts, rather than machine learning experts, using convenient, high-level domain specific languages (DSLs). In addition to enabling end users to more easily describe their problems, user-friendly DSLs also contain relevant task information which can be leveraged to make effective zero-shot transfer plausibly achievable. As an example, we use the Ludii general game system, which includes a highly varied set of over 1000 distinct games described in such a language. We propose a simple baseline approach for transferring fully convolutional policy-value networks, which are used to guide search agents similar to AlphaZero, between any pair of games modelled in this system. Extensive results---including various cases of highly successful zero-shot transfer---are provided for a wide variety of source and target games.
Bibliographic reference |
Dennis J.N.J. Soemers ; Mella Vegard ; Piette, Eric ; Matthew Stephenson ; Cameron Browne ; et. al. Towards a General Transfer Approach for Policy-Value Networks. In: Transactions on Machine Learning Research, (2023) |
Permanent URL |
http://hdl.handle.net/2078.1/281298 |