All patterns

Last updated: 2026-05-11

Eval Before Autonomy

A release pattern where every new autonomous action is gated by traces and a focused eval set.

When to use it

  • The agent can write, purchase, message, deploy, or mutate state.
  • Prompt/model changes could alter tool behavior.
  • The team needs a reviewable safety story.

Avoid when

  • The feature is read-only and low risk.
  • No one owns the eval dataset after launch.

Implementation notes

  • Turn real failures into eval cases.
  • Track cost and latency with quality.
  • Require traces for every autonomous tool call.