diagnose_pipeline()
The turnkey on-ramp. Runs your pipeline under a Witness tracer, scores the
captured trace with your judge, and invokes the attribution engine — all in
one call.
Signature
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
pipeline | Callable | (required) | A callable instrumented with @span / with span(...). Called once as pipeline(*args, **kwargs) |
*args | Positional arguments forwarded to the pipeline | ||
judge | Callable[[AgentTrace], float] | (required) | Returns the score for the captured trace. Typically wraps a Verdict metric via judge_from_verdict |
rubric | str | (required) | Evaluation rubric passed to the attribution methods |
llm | Callable[[str], str] | (required) | LLM callable for critic and decomposition methods |
ideal | str | None | None | Optional reference output stored on the trace |
trace_metadata | dict | None | None | Trace-level metadata (model name, run id, etc.) |
method | str | "all" | "critic", "decomposition", "ablation", or "all" |
runner | Callable | None | None | Pipeline replay callable for ablation. When omitted, method="ablation" raises; method="all" silently skips ablation |
score_range | tuple[float, float] | (0.0, 1.0) | Score range for delta normalization in ablation |
ablation_placeholder | str | "null" | Placeholder strategy for ablation: "null" or "ideal" |
ablation_budget | int | None | None | Cap ablation runs. None = ablate every span |
**kwargs | Keyword arguments forwarded to the pipeline |
Example
Origin
The attribution engine. Use directly when you already have a captured
AgentTrace and a score.
Constructor
| Parameter | Type | Default | Description |
|---|---|---|---|
llm | Callable[[str], str] | (required) | LLM callable for critic and decomposition |
runner | Callable | None | None | Pipeline replay callable for ablation |
judge | Callable[[AgentTrace], float] | None | None | Scoring callable for ablation. Must be provided together with runner |
score_range | tuple[float, float] | (0.0, 1.0) | Score range for ablation delta normalization |
runner and judge must be provided together — having one without the other
raises ValueError.
Origin.diagnose()
| Parameter | Type | Default | Description |
|---|---|---|---|
trace | AgentTrace | (required) | The execution trace to diagnose |
score | float | (required) | Judge score being explained (typically 0.0–1.0) |
rubric | str | (required) | The rubric the judge used |
method | str | "all" | "critic", "decomposition", "ablation", or "all" |
ablation_placeholder | str | "null" | Ablation placeholder strategy: "null" or "ideal" |
ablation_budget | int | None | None | Cap ablation runs |
Example
Origin.ablation_available
bool — True when both runner and judge are set.
judge_from_verdict()
Adapts any Verdict Metric to Origin’s Callable[[AgentTrace], float]
contract. Duck-typed — no hard Verdict dependency at import time.
Signature
| Parameter | Type | Default | Description |
|---|---|---|---|
metric | Verdict Metric | (required) | Any Verdict metric: LLMJudge, ExactMatch, BleuScore, RougeScore, or custom |
extract_response | Callable | None | None | Extract the response string from the trace. Defaults to the last root span’s output |
extract_messages | Callable | None | None | Extract the messages list from the trace. Defaults to the first root span’s input as a user message |
Example
Using any callable as a judge
judge= accepts any Callable[[AgentTrace], float] — you don’t need Verdict: