abhi ram salammagarionline · fremont, ca

← home

last updated 2026-06-24

This month: Part 4 of the dense-bench series went up — agentic tool-calling reliability vs weight quantization. The headline: peak memory doesn't predict whether the agent finishes the job, and the quantization effect flips sign between Qwen and Llama.five posts now. the series outgrew its "throughput" title.

Now: exploring local agents on MLX — how reliably a quantized 70B can actually drive multi-step tool use end-to-end, not just emit one valid call. The next arm is a constrained-decoding on/off comparison on the same suite.

Also planned: a head-to-head with the DGX Spark on the same matrix once it's set up. The compute-vs-memory shape will look very different on a discrete-memory accelerator.

In parallel: migrating the homelab off a flat network. UniFi is in, VLANs exist, the smart bulbs have been quarantined from the work laptop.


Reading / chewing on:


Not work: trying to read more fiction. Going badly.