-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support iterable datasets in GRPO #3226
base: main
Are you sure you want to change the base?
Conversation
One thing I still want to mention, but didn't know how to handle, is that we need to set |
Thanks for the PR. I don't really understand why we can't use the same approach as for the regular Dataset. Adapting the sampler should be enough, no? |
The problem is that torch's
I thought that updating the data collator to do the duplication on the fly would be the easiest solution for an iterable dataset instead. I apologise if your comment was referring to something else. Let me know! |
What about |
I'm getting this error with both "my" approach and the approach of this PR:
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Yes, this is exactly the error that I referenced in my note here:
For debugging purposes, you can bypass it by setting Also, I think the map suggestion could work as well, I just hadn't considered that before. |
What does this PR do?
This PR solves the issue described in #3213. Additionally, it avoids the need for the PR in #3216.
I implemented support for an
IterableDataset
by overriding theget_train_dataloader
andget_eval_dataloader
methods from theTrainer
class. Now, when GRPO is given an iterable dataset, the batch size is divided byself.num_generations
and the data collator takes care of duplicating the samples afterwards.Fixes #3213
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.