Features
Reasoning & Verification
Advanced reasoning capabilities including Process Reward Models, STaR, and Dual-System training.
Process Reward Models (PRM)
Step-by-step verification training that provides rewards at each reasoning step.
config = PRMConfig(
step_separator="\n",
prm_loss_type="bce",
)STaR (Self-Taught Reasoner)
Bootstrap reasoning capabilities through iterative self-improvement.
config = STaRConfig(
bootstrap_iterations=5,
rationalization=True,
)Dual-System Training
Joint training of System 1 (intuition) and System 2 (reasoning) capabilities.
config = DualSystemConfig(
system1_weight=0.3,
system2_weight=0.7,
)