Quickstart

Prerequisites

An NVIDIA or AMD GPU (or use --dry-run on CPU)
Python 3.10+
An Anthropic API key (or any supported LLM provider)

Install

pip install aevyra-forge
export ANTHROPIC_API_KEY=sk-ant-...

Prepare a workload

Forge optimizes against your real traffic. The workload is a JSONL file — one request per line:

{"prompt": "Explain attention in transformers.", "expected_output_tokens": 256}
{"prompt": "Write a Python function to merge two sorted lists.", "expected_output_tokens": 128}

A 50-example starter is bundled:

# Use the bundled sample
ls examples/sample_workload.jsonl

For production use, export a sample of your real traffic from Langfuse, your API gateway logs, or any JSONL source.

Run

aevyra-forge tune \
  --model meta-llama/Llama-3.2-1B-Instruct \
  --device cuda \
  --workload examples/sample_workload.jsonl \
  --max-experiments 10

Forge will:

Auto-detect your GPU via nvidia-smi / rocm-smi
Boot vLLM with the baseline config
Benchmark your workload at up to 8 concurrent requests
Ask the agent LLM to propose a mutation
Boot → bench → keep or revert, repeat

Results are saved to .forge/runs/<run-id>/.

Resume an interrupted run

If a run is interrupted (Ctrl-C, timeout, OOM), resume with no arguments — Forge reads everything from the saved config:

aevyra-forge tune resume

View results

aevyra-forge report .forge/

Prints a summary table with throughput, P99 latency, accuracy, and whether each experiment was kept.

Device options

Flag	Hardware	Requirement
`--device cuda`	NVIDIA GPU	`nvidia-smi` on PATH
`--device rocm`	AMD GPU	`rocm-smi` on PATH
`--device cpu`	No GPU	`--dry-run` recommended

LLM providers

The --llm flag selects the agent model. Format: provider/model.

# Anthropic (default)
--llm anthropic/claude-sonnet-4-6

# OpenRouter
--llm openrouter/meta-llama/llama-3.1-70b-instruct

# OpenAI
--llm openai/gpt-4o

Set the corresponding API key environment variable before running.

Next steps

Tutorial: Colab quickstart — run on a free T4
Concepts: Recipe — what’s in recipe.yaml
Concepts: Playbook — how the agent decides what to try
API reference: Orchestrator — programmatic use

Getting started

Concepts

Tutorials

Prerequisites

Install

Prepare a workload

Run

Resume an interrupted run

View results

Device options

LLM providers

Next steps

Getting started

Concepts

Tutorials

Documentation Index

​Prerequisites

​Install

​Prepare a workload

​Run

​Resume an interrupted run

​View results

​Device options

​LLM providers

​Next steps

Prerequisites

Install

Prepare a workload

Run

Resume an interrupted run

View results

Device options

LLM providers

Next steps