Training Guides
SFT & CoT Training
Supervised Fine-Tuning and Chain-of-Thought training for building foundational capabilities.
Supervised Fine-Tuning (SFT)
SFT is the first step in the RLHF pipeline, teaching the model to follow instructions.
from thinkrl.training import SFTTrainer, SFTConfig
config = SFTConfig(
model_name_or_path="meta-llama/Llama-3-8b",
dataset_name="tatsu-lab/alpaca",
output_dir="./sft_checkpoint",
num_train_epochs=3,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=2e-5,
max_seq_length=2048,
)
trainer = SFTTrainer(config)
trainer.train()Chain-of-Thought Training
Train models to reason step-by-step before producing final answers.
from thinkrl.training import CoTTrainer, CoTConfig
config = CoTConfig(
model_name_or_path="Qwen/Qwen2.5-7B",
reasoning_type="cot", # or "tot" for Tree-of-Thought
max_reasoning_steps=10,
thought_separator="<think>",
dataset_name="your-org/cot-dataset",
)
trainer = CoTTrainer(config)
trainer.train()Command Line Training
# SFT Training
python -m thinkrl.cli.train_sft \
--model_name_or_path meta-llama/Llama-3-8b \
--dataset_name tatsu-lab/alpaca \
--output_dir ./sft_checkpoint
# CoT Training
python -m thinkrl.cli.train_cot \
--model_name_or_path ./sft_checkpoint \
--reasoning_type cot \
--max_reasoning_steps 10