PPO-Humanoid

This repository contains the implementation of a Proximal Policy Optimization (PPO) agent to control a humanoid in the OpenAI Gymnasium Mujoco environment. The agent is trained to master complex humanoid locomotion using deep reinforcement learning.

Results

Here is a demonstration of the agent's performance after training for 3000 epochs on the Humanoid-v4 environment.

Installation

To get started with this project, follow these steps:

Clone the Repository:

git clone https://github.com/ProfessorNova/PPO-Humanoid.git
cd PPO-Humanoid

Set Up Python Environment: Make sure you have Python installed (tested with Python 3.10.11).
Install Dependencies: Run the following command to install the required packages:
```
pip install -r req.txt
```
For proper PyTorch installation, visit pytorch.org and follow the instructions based on your system configuration.
Install Gymnasium Mujoco: You need to install the Mujoco environment to simulate the humanoid:
```
pip install gymnasium[mujoco]
```
Train the Model (PPO): To start training the model, run:
```
python train_ppo.py
```
Monitor Training Progress: You can monitor the training progress by viewing the videos in the videos folder or by looking at the graphs in TensorBoard:
```
tensorboard --logdir "logs"
```

Description

Overview

This project implements a reinforcement learning agent using the Proximal Policy Optimization (PPO) algorithm, a popular method for continuous control tasks. The agent is designed to learn how to control a humanoid robot in a simulated environment.

Key Components

Agent: The core neural network model that outputs both policy (action probabilities) and value estimates.
Environment: The Humanoid-v5 environment from the Gymnasium Mujoco suite, which provides a realistic physics simulation for testing control algorithms.
Buffer: A class for storing trajectories (observations, actions, rewards, etc.) that the agent collects during interaction with the environment. This data is later used to calculate advantages and train the model.
Training Script: The train_ppo.py script handles the training loop, including collecting data, updating the model, and logging results.

Usage

Training

You can customize the training by modifying the command-line arguments:

--n-envs: Number of environments to run in parallel (default: 32).
--n-epochs: Number of epochs to train the model (default: 3000).
--n-steps: Number of steps per environment per epoch (default: 2048).
--batch-size: Batch size for training (default: 16384).
--train-iters: Number of training iterations per epoch (default: 20).

For example:

python train_ppo.py --n-envs 64 --batch-size 4096 --train-iters 30 --cuda

All hyperparameters can be viewed either with python train_ppo.py --help or by looking at the parse_args_ppo() function in lib/utils.py.

Statistics

Performance Metrics:

The following charts provide insights into the performance during training with the current default hyperparameters (Note: After updating to Humanoid-v5 environment I only trained for 1000 epochs. The results are still promising and should achieve the previous results with more training):

Reward:
Policy Loss:
Value Loss:
Entropy:

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
docs		docs
lib		lib
.gitignore		.gitignore
README.md		README.md
req.txt		req.txt
train_ppo.py		train_ppo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PPO-Humanoid

Results

Installation