Notebook

Field notes from the studio.

Writing for ML engineers, applied scientists, and the leaders deciding where to bet on AI next.

EVALS

The gap between offline eval and live performance is almost always a coverage problem. A practical playbook.

Jun 2, 2026

AGENTS

Frontier model quality has converged. The remaining alpha is in the tool surface you expose to the agent.

May 26, 2026

RAG

A decision tree from 40+ deployments. The defaults most teams pick are wrong about a third of the time.

May 19, 2026

COST

Five techniques: cascading, distillation, structured outputs, semantic caching, and ruthless prompt compression.

May 11, 2026

SAFETY

42 attack templates that catch real-world misuse before your users do.

May 2, 2026

GOVERNANCE

Most of it does not. Here is what does, and how to map your existing GRC stack to it.

Apr 22, 2026