Why there's no multiagent policy gradient theorem
I present a derivation of the policy gradient theorem for homogeneous multiagent RL systems. The results are… boring.
AI researcher (officially Research Engineer) at Meta. Until recently, I was working at AI Redefined where I was building tooling for human-in-the-loop RL. I got my PhD from École Polytechnique, my thesis topic was “Simulating Crowds with Reinforcement Learning”. In the past I was at University of Warsaw, KTH Royal Institute of Technology, and Aalto University.
Back in the day, I was the maintainer of Gymnasium (the official continuation of OpenAI Gym), sometimes reviewing other people’s bugs, sometimes adding my own bugs… I mean, features. Sometimes I also helped with PettingZoo, SuperSuit and other projects in the Farama Foundation.
Erdős number <= 4
PhD Artificial Intelligence
École Polytechnique
MSc Autonomous Systems
KTH Royal Institute of Technology & Aalto University
BSc Physics
University of Warsaw
I present a derivation of the policy gradient theorem for homogeneous multiagent RL systems. The results are… boring.
An ongoing documentation of my implementations of Advent of Code 2020
I describe the process of making this exact website - to help avoid some frustrations I encountered along the way.