Writing
- 2026-05-07Dense Bench, Part 2: The Llama 128K cliff.
At 128K context, Llama-3.1-70B-Instruct emits end-of-turn as its first generated token. Qwen-2.5-72B-Instruct doesn't. The 42 tokens past the nominal context limit, the harness flag that caught it, and what to take away if you're shipping long-context Llama at the edge.
- 2026-05-07Dense Bench, Part 0: Why I built a benchmark I didn't need.
Why existing MLX benchmark data didn't answer the question I had about dense 70B inference at long context on M3 Ultra 512GB — and the three properties I needed in a harness before I trusted any of it.
- 2026-05-03Dense 70B in MLX on M3 Ultra 512GB: prefill and decode curves from 4K to 128K context
Real prefill, TTFT, decode, and memory curves for Llama-3.1-70B-Instruct-4bit and Qwen-2.5-72B-Instruct-4bit at 4K–128K context on Mac Studio M3 Ultra 512GB. Methodology, raw tables, and a reproducible harness.
- 2026-04-18Hello
What this site is, why it exists, what will live here.