⚠️ Benchmark Methodology: These numbers represent Peak Batched Throughput (measured using 200 ShareGPT prompts with high concurrency). They demonstrate the maximum aggregate tokens/second the hardware can produce when fully saturated. Because vLLM optimizes for highly concurrent workloads, throughput for a single isolated request (Concurrency = 1) will be naturally lower.