diagnose_pipeline & Origin

`diagnose_pipeline()`

The turnkey on-ramp. Runs your pipeline under a Witness tracer, scores the captured trace with your judge, and invokes the attribution engine — all in one call.

from aevyra_origin import diagnose_pipeline

Signature

def diagnose_pipeline(
    pipeline,
    *args,
    judge,
    rubric,
    llm,
    ideal=None,
    trace_metadata=None,
    method="all",
    runner=None,
    score_range=(0.0, 1.0),
    ablation_placeholder="null",
    ablation_budget=None,
    **kwargs,
) -> Attribution

Parameters

Parameter	Type	Default	Description
`pipeline`	`Callable`	(required)	A callable instrumented with `@span` / `with span(...)`. Called once as `pipeline(args, *kwargs)`
`*args`			Positional arguments forwarded to the pipeline
`judge`	`Callable[[AgentTrace], float]`	(required)	Returns the score for the captured trace. Typically wraps a Verdict metric via `judge_from_verdict`
`rubric`	`str`	(required)	Evaluation rubric passed to the attribution methods
`llm`	`Callable[[str], str]`	(required)	LLM callable for critic and decomposition methods
`ideal`	`str \| None`	`None`	Optional reference output stored on the trace
`trace_metadata`	`dict \| None`	`None`	Trace-level metadata (model name, run id, etc.)
`method`	`str`	`"all"`	`"critic"`, `"decomposition"`, `"ablation"`, or `"all"`
`runner`	`Callable \| None`	`None`	Pipeline replay callable for ablation. When omitted, `method="ablation"` raises; `method="all"` silently skips ablation
`score_range`	`tuple[float, float]`	`(0.0, 1.0)`	Score range for delta normalization in ablation
`ablation_placeholder`	`str`	`"null"`	Placeholder strategy for ablation: `"null"` or `"ideal"`
`ablation_budget`	`int \| None`	`None`	Cap ablation runs. `None` = ablate every span
`**kwargs`			Keyword arguments forwarded to the pipeline

Example

from aevyra_origin import diagnose_pipeline
from aevyra_origin.llm import anthropic_llm
from aevyra_origin.judges import judge_from_verdict
from aevyra_verdict import LLMJudge
from aevyra_verdict.providers import get_provider

result = diagnose_pipeline(
    my_agent,
    "I was charged twice — how do I get a refund?",
    judge=judge_from_verdict(LLMJudge(judge_provider=get_provider("anthropic"))),
    rubric="Accurate, grounded in the policy docs, and addresses the user's concern.",
    llm=anthropic_llm(),
    method="all",
)
print(result.render())

`Origin`

The attribution engine. Use directly when you already have a captured AgentTrace and a score.

from aevyra_origin import Origin

Constructor

Origin(
    llm,
    *,
    runner=None,
    judge=None,
    score_range=(0.0, 1.0),
)

Parameter	Type	Default	Description
`llm`	`Callable[[str], str]`	(required)	LLM callable for critic and decomposition
`runner`	`Callable \| None`	`None`	Pipeline replay callable for ablation
`judge`	`Callable[[AgentTrace], float] \| None`	`None`	Scoring callable for ablation. Must be provided together with `runner`
`score_range`	`tuple[float, float]`	`(0.0, 1.0)`	Score range for ablation delta normalization

runner and judge must be provided together — having one without the other raises ValueError.

`Origin.diagnose()`

def diagnose(
    *,
    trace,
    score,
    rubric,
    method="all",
    ablation_placeholder="null",
    ablation_budget=None,
) -> Attribution

Parameter	Type	Default	Description
`trace`	`AgentTrace`	(required)	The execution trace to diagnose
`score`	`float`	(required)	Judge score being explained (typically 0.0–1.0)
`rubric`	`str`	(required)	The rubric the judge used
`method`	`str`	`"all"`	`"critic"`, `"decomposition"`, `"ablation"`, or `"all"`
`ablation_placeholder`	`str`	`"null"`	Ablation placeholder strategy: `"null"` or `"ideal"`
`ablation_budget`	`int \| None`	`None`	Cap ablation runs

Example

from aevyra_origin import Origin
from aevyra_origin.llm import anthropic_llm

origin = Origin(llm=anthropic_llm())
result = origin.diagnose(
    trace=my_trace,
    score=0.31,
    rubric="Accurate, grounded in the policy docs.",
    method="all",
)
print(result.render())

`Origin.ablation_available`

bool — True when both runner and judge are set.

`judge_from_verdict()`

Adapts any Verdict Metric to Origin’s Callable[[AgentTrace], float] contract. Duck-typed — no hard Verdict dependency at import time.

from aevyra_origin.judges import judge_from_verdict

Signature

def judge_from_verdict(
    metric,
    *,
    extract_response=None,
    extract_messages=None,
) -> Callable[[AgentTrace], float]

Parameter	Type	Default	Description
`metric`	Verdict `Metric`	(required)	Any Verdict metric: `LLMJudge`, `ExactMatch`, `BleuScore`, `RougeScore`, or custom
`extract_response`	`Callable \| None`	`None`	Extract the response string from the trace. Defaults to the last root span’s output
`extract_messages`	`Callable \| None`	`None`	Extract the messages list from the trace. Defaults to the first root span’s input as a user message

Example

from aevyra_origin.judges import judge_from_verdict
from aevyra_verdict import LLMJudge
from aevyra_verdict.providers import get_provider

judge = judge_from_verdict(
    LLMJudge(judge_provider=get_provider("anthropic")),
)

# Custom extraction when the default (last root span output) isn't right
judge = judge_from_verdict(
    LLMJudge(judge_provider=get_provider("anthropic")),
    extract_response=lambda trace: trace.nodes[-1].output,
)

Using any callable as a judge

judge= accepts any Callable[[AgentTrace], float] — you don’t need Verdict:

def my_judge(trace) -> float:
    output = trace.nodes[-1].output
    return 1.0 if "refund" in str(output).lower() else 0.0

result = diagnose_pipeline(my_agent, question, judge=my_judge, rubric=rubric, llm=llm)

Getting started

Guides

Tutorials

API reference

diagnose_pipeline & Origin