AMD Radeon AI PRO R9700 — MTP Benchmarks

AMD Radeon AI PRO R9700 · 32GB vRAM

Multi-Token Prediction (MTP) is an experimental speculative decoding feature for `llama.cpp` (see PR #22673). It allows supported models to predict multiple tokens per forward pass, significantly increasing generation speed. These benchmarks compare the baseline generation speed against MTP with 2-token and 3-token drafts.

Loading results...

← Back to Main Benchmarks

Model	Toolbox	Baseline tok/s	MTP-2 tok/s	Speedup MTP-2	MTP-3 tok/s	Speedup MTP-3