ThinkRL
GitHub
Advanced

vLLM Integration

Configure vLLM for high-throughput inference during RLHF training.

Installation

pip install -e ".[vllm]"

Configuration

config = ModelConfig(
    use_vllm=True,
    vllm_tensor_parallel_size=2,      # GPUs for tensor parallelism
    vllm_gpu_memory_utilization=0.9,  # GPU memory fraction
    vllm_max_model_len=4096,          # Max sequence length
    vllm_enforce_eager=False,         # Use CUDA graphs
)

Performance Benefits

10x Faster Generation

PagedAttention enables 10x faster experience collection during RLHF training.

Continuous Batching

Dynamic batch scheduling maximizes GPU utilization.