PTI achieves ~2× throughput using a single quantized model (Q5_K_M or better) by running 4 generation streams in one batched decode call. The GPU loads model weights once per step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft model. No quality loss
bigattichouse/packed-twin-inference is sitting at #717 on the trending leaderboard with a pulse of 18/100 with no cross-source channels firing yet — GitHub-stars-only signal so far.
It sits at 9 stars without a fresh weekly delta on record — the trending placement here is steady-state interest in the HIP devtools space rather than a 7-day breakout.
Watch-outs: no tagged release on record (treat as pre-stable).
git clone https://github.com/bigattichouse/packed-twin-inference.gitThen follow the README in the cloned directory.
//COMMENTS · 0
Sign in to join the discussion