aevyra-origin diagnoses why your agent failed. Point it at your pipeline and a rubric; it runs the pipeline, grades it with a judge, and returns a ranked list of culprit spans — each with severity, confidence, grounded reasoning, and aDocumentation Index
Fetch the complete documentation index at: https://docs.aevyra.ai/llms.txt
Use this file to discover all available pages before exploring further.
fix_type that tells you exactly where the repair effort belongs.
fix_type="prompt" spans are candidates for Reflex — the others (retrieval,
routing, tool_schema, infrastructure) need a different intervention. Origin tells
you which is which so you don’t waste time rewriting prompts that won’t help.
Where Origin fits
Origin is the diagnosis stage in the Aevyra stack: Witness captures the execution trace. Verdict scores it. Origin reads both, pinpoints the failure, and classifies the fix type. When the fix is in a prompt, Reflex can act on it automatically. For every other failure type — a bad retrieval index, an ambiguous tool schema, a mis-routing — Origin tells you exactly where to look so you don’t waste time rewriting prompts that won’t help.What it diagnoses
fix_type | What it means | Who fixes it |
|---|---|---|
prompt | The instructions or context in the prompt need changing | Reflex |
tool_schema | The tool’s input schema is ambiguous; the LLM called it wrong | Schema redesign |
retrieval | The retrieval step fetched wrong, irrelevant, or missing docs | Index / embedding fix |
routing | The pipeline sent the query down the wrong branch or tool | Routing logic fix |
infrastructure | Timeout, rate limit, auth error, quota exceeded | Ops / infra fix |
unknown | Origin could not determine the fix type | Manual review |
Three attribution methods
Origin ships three methods that can run independently or together: LLM-as-critic (method="critic") reads the rubric, score, and full trace in
one LLM call and returns a ranked list of culprit spans. Fast, general, works for
any rubric.
Score decomposition (method="decomposition") breaks the rubric into its
underlying criteria, attributes each criterion to a span, and aggregates blame
across failed criteria. Better at surfacing distributed failures where multiple
spans each contributed.
Ablation (method="ablation") replaces each span’s output with a neutral
placeholder, replays the pipeline via a user-supplied runner, and re-scores. The
only method that makes a causal claim — a large score drop means the span is
genuinely responsible.
method="all" (default) runs critic and decomposition always (two LLM calls),
adds ablation when you supply a runner, and merges the results with a
corroboration bonus for spans named by multiple methods.
Quick start
Diagnose your first pipeline failure in under 5 minutes
Tutorial
Full walkthrough: a plan-act-respond agent that gets the wrong answer
Methods
Critic, decomposition, ablation — when to use each
API reference
Attribution, NodeAttribution, PromptAttribution