Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.aevyra.ai/llms.txt

Use this file to discover all available pages before exploring further.

Prerequisites

  • An NVIDIA or AMD GPU (or use --dry-run on CPU)
  • Python 3.10+
  • An Anthropic API key (or any supported LLM provider)

Install

pip install aevyra-forge
export ANTHROPIC_API_KEY=sk-ant-...

Prepare a workload

Forge optimizes against your real traffic. The workload is a JSONL file — one request per line:
{"prompt": "Explain attention in transformers.", "expected_output_tokens": 256}
{"prompt": "Write a Python function to merge two sorted lists.", "expected_output_tokens": 128}
A 50-example starter is bundled:
# Use the bundled sample
ls examples/sample_workload.jsonl
For production use, export a sample of your real traffic from Langfuse, your API gateway logs, or any JSONL source.

Run

aevyra-forge tune \
  --model meta-llama/Llama-3.2-1B-Instruct \
  --device cuda \
  --workload examples/sample_workload.jsonl \
  --max-experiments 10
Forge will:
  1. Auto-detect your GPU via nvidia-smi / rocm-smi
  2. Boot vLLM with the baseline config
  3. Benchmark your workload at up to 8 concurrent requests
  4. Ask the agent LLM to propose a mutation
  5. Boot → bench → keep or revert, repeat
Results are saved to .forge/runs/<run-id>/.

Resume an interrupted run

If a run is interrupted (Ctrl-C, timeout, OOM), resume with no arguments — Forge reads everything from the saved config:
aevyra-forge tune resume

View results

aevyra-forge report .forge/
Prints a summary table with throughput, P99 latency, accuracy, and whether each experiment was kept.

Device options

FlagHardwareRequirement
--device cudaNVIDIA GPUnvidia-smi on PATH
--device rocmAMD GPUrocm-smi on PATH
--device cpuNo GPU--dry-run recommended

LLM providers

The --llm flag selects the agent model. Format: provider/model.
# Anthropic (default)
--llm anthropic/claude-sonnet-4-6

# OpenRouter
--llm openrouter/meta-llama/llama-3.1-70b-instruct

# OpenAI
--llm openai/gpt-4o
Set the corresponding API key environment variable before running.

Next steps