Weights & Biases Weave
Open-source toolkit for tracing, evaluating, and iterating on LLM apps.
Open source
Best when ML teams already use W&B and want LLM traces, evals, and comparisons in the same experimentation culture.
Selection advice
Choose Weave when your team already treats model iteration as tracked experiments, not one-off debugging sessions.
Best for
- LLM trace inspection in notebooks
- eval-driven iteration loops
- teams with existing W&B workflows
Not ideal for
- teams with no experiment-tracking culture
- production-only ops teams that avoid notebook workflows
Core concepts
tracesevalsscorersdatasetsexperiments
Minimal implementation shape
Wrap an agent function with Weave tracing, inspect intermediate steps, and compare eval scores across three prompt variants.