r1-zero

Star

Here are 5 public repositories matching this topic...

om-ai-lab / VLM-R1

Star

Solve Visual Understanding with Reinforced VLMs

reinforcement-learning vlm multimodal llm qwen deepseek-r1 grpo r1-zero vlm-r1 multimodal-r1

Updated Mar 20, 2025
Python

turningpoint-ai / VisualThinker-R1-Zero

Star

Explore the Multimodal “Aha Moment” on 2B Model

reinforcement-learning reasoning r1 post-training multimodal deepseek deepseek-r1 grpo deepseek-r1-zero r1-zero multimodal-journey multimodal-r1

Updated Mar 18, 2025
Python

sail-sg / oat

Star

🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.

thompson-sampling alignment reasoning distributed-training ppo dueling-bandits dpo distributed-rl llm online-rl rlhf llm-aligment online-alignment llm-exploration grpo r1-zero

Updated Mar 10, 2025
Python

sylvain-wei / 24-Game-Reasoning

Star

超简单复现Deepseek-R1-Zero和Deepseek-R1，以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL，以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of DeepSeek R1-Zero, DeepSeek R1

alignment reasoning r1 post-training cot sft o1 24game llm rlhf deepseek r1-zero verl long-cot

Updated Mar 3, 2025
Python

yflyzhang / simpleR1

Star

simpleR1: A Simple R1 Framework

reinforcement-learning deepseek-r1 grpo r1-zero grpotrainer

Updated Mar 19, 2025
Python

Improve this page

Add a description, image, and links to the r1-zero topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the r1-zero topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

r1-zero

Here are 5 public repositories matching this topic...

om-ai-lab / VLM-R1

turningpoint-ai / VisualThinker-R1-Zero

sail-sg / oat

sylvain-wei / 24-Game-Reasoning

yflyzhang / simpleR1

Improve this page

Add this topic to your repo