last updated 2026-06-24
This month: Part 4 of the dense-bench series went up — agentic tool-calling reliability vs weight quantization. The headline: peak memory doesn't predict whether the agent finishes the job, and the quantization effect flips sign between Qwen and Llama.five posts now. the series outgrew its "throughput" title.
Now: exploring local agents on MLX — how reliably a quantized 70B can actually drive multi-step tool use end-to-end, not just emit one valid call. The next arm is a constrained-decoding on/off comparison on the same suite.
Also planned: a head-to-head with the DGX Spark on the same matrix once it's set up. The compute-vs-memory shape will look very different on a discrete-memory accelerator.
In parallel: migrating the homelab off a flat network. UniFi is in, VLANs exist, the smart bulbs have been quarantined from the work laptop.
Reading / chewing on:
- Sutton, "The Bitter Lesson" — short, ages well.
- Karpathy, "Yes you should understand backprop" — re-reading for the framing, not the math.
- A long-context KV-cache paper that's been on my desk for two weeks.
Not work: trying to read more fiction. Going badly.