Skip to content

Add Normalization and Action Clipping Wrappers #586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cruz-lucas
Copy link

I implemented new wrappers designed for on-device normalization and action clipping to address the CPU–GPU transfer issues when using gym’s normalization wrappers. This contribution addresses Issue #49. In particular, I added:

  • RunningMeanStd: A class to compute running mean and variance for normalization.
  • ClipVecAction: A wrapper that clips continuous actions to a specified range.
  • NormalizeVecObservation: A wrapper that normalizes observations using running statistics computed by RunningMeanStd.
  • NormalizeVecReward: A wrapper that normalizes rewards based on the running statistics of the accumulated (discounted) rewards.

I also added sanity check tests to verify the correct behavior of these wrappers. Please let me know if this doesn't align with Brax's design philosophy.

@btaba
Copy link
Collaborator

btaba commented Apr 10, 2025

Hi @cruz-lucas , is the running statistic pattern in agents not sufficient for your use-case? Advantages can also be normalized in the PPO agent. ClipVecAction is not implemented, but it's usually simple enough to include the one-line jp.clip in the environment itself. What are your thoughts here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants