This repository contains the implementation of a Proximal Policy Optimization (PPO) agent to control a humanoid in the OpenAI Gymnasium Mujoco environment. The agent is trained to master complex humanoid locomotion using deep reinforcement learning.
Here is a demonstration of the agent's performance after training for 3000 epochs on the Humanoid-v4 environment.
To get started with this project, follow these steps:
-
Clone the Repository:
git clone https://github.com/ProfessorNova/PPO-Humanoid.git cd PPO-Humanoid
-
Set Up Python Environment: Make sure you have Python installed (tested with Python 3.10.11).
-
Install Dependencies: Run the following command to install the required packages:
pip install -r req.txt
For proper PyTorch installation, visit pytorch.org and follow the instructions based on your system configuration.
-
Install Gymnasium Mujoco: You need to install the Mujoco environment to simulate the humanoid:
pip install gymnasium[mujoco]
-
Train the Model (PPO): To start training the model, run:
python train_ppo.py
-
Monitor Training Progress: You can monitor the training progress by viewing the videos in the
videos
folder or by looking at the graphs in TensorBoard:tensorboard --logdir "logs"
This project implements a reinforcement learning agent using the Proximal Policy Optimization (PPO) algorithm, a popular method for continuous control tasks. The agent is designed to learn how to control a humanoid robot in a simulated environment.
- Agent: The core neural network model that outputs both policy (action probabilities) and value estimates.
- Environment: The Humanoid-v5 environment from the Gymnasium Mujoco suite, which provides a realistic physics simulation for testing control algorithms.
- Buffer: A class for storing trajectories (observations, actions, rewards, etc.) that the agent collects during interaction with the environment. This data is later used to calculate advantages and train the model.
- Training Script: The
train_ppo.py
script handles the training loop, including collecting data, updating the model, and logging results.
You can customize the training by modifying the command-line arguments:
--n-envs
: Number of environments to run in parallel (default: 32).--n-epochs
: Number of epochs to train the model (default: 3000).--n-steps
: Number of steps per environment per epoch (default: 2048).--batch-size
: Batch size for training (default: 16384).--train-iters
: Number of training iterations per epoch (default: 20).
For example:
python train_ppo.py --n-envs 64 --batch-size 4096 --train-iters 30 --cuda
All hyperparameters can be viewed either with python train_ppo.py --help
or by looking at the
parse_args_ppo()
function in lib/utils.py
.
The following charts provide insights into the performance during training with the current default hyperparameters (Note: After updating to Humanoid-v5 environment I only trained for 1000 epochs. The results are still promising and should achieve the previous results with more training):