Dense Bench
a series · 3 of 3 published
Dense Bench is the investigation that started with me buying a Mac Studio M3 Ultra 512GB and wanting to know whether I'd bought a workstation or a very expensive TV. The question: can a $10K box do real frontier-lab work on long-context dense LLM inference, or is it a toy? I built a harness because the public benchmarks weren't asking what I wanted to know. The answer turned out to be more interesting — and more conditional — than I expected. Posts are numbered. New entries land here as I finish them.
- 00.
the setup. why a Mac Studio, why existing benchmarks weren't enough, what I wanted to find out.
- 01.
the numbers — prefill, decode, TTFT, and memory curves from 4K to 128K on two dense 70B-class models.
- 02.
one number from the sweep that didn't match my prior, and what I learned chasing it.