AMD Strix Halo (gfx1151) vLLM Benchmarks

View on GitHub →

Attention
⚠️ Benchmark Methodology: These numbers represent Peak Batched Throughput (measured using 200 ShareGPT prompts with high concurrency). They demonstrate the maximum aggregate tokens/second the hardware can produce when fully saturated. Because vLLM optimizes for highly concurrent workloads, throughput for a single isolated request (Concurrency = 1) will be naturally lower.
Loading benchmark results...