ThinkRL
GitHub
Features

Reasoning & Verification

Advanced reasoning capabilities including Process Reward Models, STaR, and Dual-System training.

Process Reward Models (PRM)

Step-by-step verification training that provides rewards at each reasoning step.

config = PRMConfig(
    step_separator="\n",
    prm_loss_type="bce",
)
STaR (Self-Taught Reasoner)

Bootstrap reasoning capabilities through iterative self-improvement.

config = STaRConfig(
    bootstrap_iterations=5,
    rationalization=True,
)
Dual-System Training

Joint training of System 1 (intuition) and System 2 (reasoning) capabilities.

config = DualSystemConfig(
    system1_weight=0.3,
    system2_weight=0.7,
)