ThinkRL
GitHub
Getting Started

Quick Start

Train your first RLHF model in just a few lines of code.

Basic RLHF Training

Here's a minimal example to get started with RLHF training using ThinkRL:

train.py
from thinkrl import RLHFTrainer, ModelConfig

# Configure your model
config = ModelConfig(
    model_name_or_path="microsoft/DialoGPT-small",
    algorithm="vapo",  # or "ppo", "reinforce", "grpo", "dapo"
    learning_rate=1e-5,
    batch_size=32
)

# Initialize trainer
trainer = RLHFTrainer(config)

# Start training
trainer.train()

Chain-of-Thought Training

Train models with reasoning capabilities using the CoT trainer:

train_cot.py
from thinkrl.training import CoTTrainer, CoTConfig

config = CoTConfig(
    model_name_or_path="Qwen/Qwen2.5-7B",
    reasoning_type="cot",
    max_reasoning_steps=10
)

trainer = CoTTrainer(config)
trainer.train()

Command-Line Training

You can also launch training directly from the command line:

# Train with VAPO algorithm
python -m thinkrl.cli.train_rl \
    --algo vapo \
    --model_name_or_path meta-llama/Llama-3-8b \
    --reward_model_path ./rm_checkpoint \
    --use_vllm True

# Train with CoT
thinkrl cot --model gpt --algorithm vapo

# Train with REINFORCE
thinkrl train --algorithm reinforce

Using vLLM for Fast Generation

Enable vLLM for 10x faster experience collection during RLHF:

from thinkrl import RLHFTrainer, ModelConfig

config = ModelConfig(
    model_name_or_path="meta-llama/Llama-3-8b",
    algorithm="grpo",
    use_vllm=True,  # Enable vLLM acceleration
    vllm_tensor_parallel_size=2,  # For multi-GPU
)

trainer = RLHFTrainer(config)
trainer.train()

Next Steps