ThinkRL

Training Guides

Comprehensive guides for training models with ThinkRL, from supervised fine-tuning to advanced reasoning capabilities.

Learn how to prepare your model with SFT and add reasoning capabilities with Chain-of-Thought training.

Train models directly on preference data without requiring a separate reward model.

Train models to verify reasoning steps and provide process-level rewards.

Train multimodal models with perception-aware policy optimization.