> ## Documentation Index > Fetch the complete documentation index at: https://docs.aevyra.ai/llms.txt > Use this file to discover all available pages before exploring further. # Introduction > A framework for evaluating and comparing LLM outputs across models and providers. aevyra-verdict runs completions against any combination of models, scores the responses with pluggable metrics, and gives you structured results for comparison — from the terminal or in Python. ## What it does Given a dataset of prompts in OpenAI message format, aevyra-verdict: 1. Sends each prompt to every model you've configured, concurrently 2. Scores each response with your chosen metrics (ROUGE, BLEU, exact match, LLM-as-judge, or custom Python functions) 3. Returns a comparison table with scores, latency, and token usage per model ## When to use it * Choosing between models for a specific task * Catching regressions after a prompt or model change * Measuring the effect of system prompt variations * Benchmarking a locally-running model against hosted APIs ## Supported providers OpenAI, Anthropic, Google (Gemini), Mistral, Cohere, OpenRouter, and any OpenAI-compatible API (vLLM, Ollama, Together, etc.). Run your first eval in under 5 minutes All commands and flags Configure models and local instances ROUGE, BLEU, LLM-as-judge, custom functions