abhi ram salammagarionline · fremont, ca

Writing

  • 2026-05-07
    Dense Bench, Part 2: The Llama 128K cliff.

    At 128K context, Llama-3.1-70B-Instruct emits end-of-turn as its first generated token. Qwen-2.5-72B-Instruct doesn't. The 42 tokens past the nominal context limit, the harness flag that caught it, and what to take away if you're shipping long-context Llama at the edge.

  • 2026-05-07
    Dense Bench, Part 0: Why I built a benchmark I didn't need.

    Why existing MLX benchmark data didn't answer the question I had about dense 70B inference at long context on M3 Ultra 512GB — and the three properties I needed in a harness before I trusted any of it.

  • 2026-05-03
    Dense 70B in MLX on M3 Ultra 512GB: prefill and decode curves from 4K to 128K context

    Real prefill, TTFT, decode, and memory curves for Llama-3.1-70B-Instruct-4bit and Qwen-2.5-72B-Instruct-4bit at 4K–128K context on Mac Studio M3 Ultra 512GB. Methodology, raw tables, and a reproducible harness.

  • 2026-04-18
    Hello

    What this site is, why it exists, what will live here.