Traditional uptime checks miss the failure mode where the service is up but wrong. You need latency and error rates, yes — but also token budgets, refusal rates, embedding drift proxies, and a sampled review queue that humans actually clear.
We treat prompts and retrieval contexts as release artifacts: versioned, diffed, and linked to incidents. When something breaks at 2 a.m., the on-call engineer should answer what changed before guessing why the model "feels off."