Why there's no multiagent policy gradient theorem
I present a derivation of the policy gradient theorem for homogeneous multiagent RL systems. The results are… boring.
AI researcher, waiting to start my new position very soon. Until recently, I was working at AI Redefined where I’m building tooling for human-in-the-loop RL. I got my PhD from École Polytechnique, my thesis topic was “Simulating Crowds with Reinforcement Learning”. In the past I was at University of Warsaw, KTH Royal Institute of Technology, and Aalto University.
In my free time, I’m the maintainer of Gymnasium (the official continuation of OpenAI Gym), sometimes reviewing other people’s bugs, sometimes adding my own bugs… I mean, features. Sometimes I also help with PettingZoo, SuperSuit and other projects in the Farama Foundation.
Erdős number <= 4
PhD Artificial Intelligence
École Polytechnique
MSc Autonomous Systems
KTH Royal Institute of Technology & Aalto University
BSc Physics
University of Warsaw
I present a derivation of the policy gradient theorem for homogeneous multiagent RL systems. The results are… boring.
An ongoing documentation of my implementations of Advent of Code 2020
I describe the process of making this exact website - to help avoid some frustrations I encountered along the way.