AMD Strix Halo (gfx1151) vLLM Benchmarks

TP1 TP2 TP2 (Eth) TP2 (Thunderbolt)

Attention

Triton ROCm AITER

⚠️ Benchmark Methodology: These numbers represent Peak Batched Throughput (measured using 200 ShareGPT prompts with high concurrency). They demonstrate the maximum aggregate tokens/second the hardware can produce when fully saturated. Because vLLM optimizes for highly concurrent workloads, throughput for a single isolated request (Concurrency = 1) will be naturally lower.

Loading benchmark results...

Benchmark Info