A modular, high-performance, reasoning-centric library for Reinforcement Learning from Human and AI Feedback (RLHF & RLAIF).
pip install -e ".[all]"State-of-the-art algorithms for RLHF training
Step-by-step guides for every training scenario
PRM, STaR, LoRA, vLLM, and more
Deep dives into advanced functionality
Contribute, report issues, or star the repo
Get started with ThinkRL in seconds