AMD Radeon AI PRO R9700 ยท 32GB vRAM
Multi-Token Prediction (MTP) is an experimental speculative decoding feature for `llama.cpp` (see PR #22673). It allows supported models to predict multiple tokens per forward pass, significantly increasing generation speed. These benchmarks compare the baseline generation speed against MTP with 2-token and 3-token drafts.