Google Research Football (GRF)
Evaluation Metrics:
- Goal Difference
- Win Rate
- TrueSkill Rating (in Kaggle, JIDI AI Competition Platform)
Approaches:
- Original paper baseline
- IMPALA (Importance Weighted Actor-Learner Architectures) [7.8 on Easy]
- Ape-X DQN (Distributed Prioritized Experience Replay with DQN) [6.5 on easy]
- PPO [2.8 on Easy]
- MARL
- TiZero
- self play, and JRPO (joint policy optimization)
- TiKick
- MAPPO, Training on Self play data of expert (WeKick - winner of Kaggle GRF)
- JiDi3rd
Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
Uses Multi Agent Transformer (MAT)
Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
Survey of MARL techniques: IPPO, MAPPO (Multi-Agent PPO), HAPPO (Heterogeneous-Agent PPO), A2PO, MAT
- TiZero