Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.aevyra.ai/llms.txt

Use this file to discover all available pages before exploring further.

aevyra-forge tunes your vLLM deployment overnight. Give it a model, a GPU, and a workload trace. It runs an autonomous loop — propose a config, boot the server, benchmark against your real workload, keep or revert, repeat. By morning you have a deployment recipe that beats hand-tuned defaults, with a full audit trail of every experiment.
pip install aevyra-forge
vLLM exposes roughly 40 serving args. The defaults are conservative — designed to work safely on any GPU, not to max out any specific one. On a T4 with a chat workload the defaults leave significant throughput on the table:
Baseline (defaults):    2718 tok/s   P99: 241 ms
After Forge (8 exps):   3421 tok/s   P99: 187 ms   (+26%)
Forge finds that gain by searching the joint space of batching, caching, and memory knobs — and keeping only the changes that improve the score on your actual workload.

Where Forge fits

Forge operates on the infrastructure layer. Where Reflex rewrites prompts and Origin diagnoses agent failures, Forge maximises throughput and minimises latency for a model that’s already doing the right thing.

The loop

Each iteration: the agent reads the playbook and experiment history, proposes one targeted change, and Forge measures whether it actually helps. The audit trail captures every decision — config, result, rationale — so you can see exactly how the winning recipe was found.

Tuning layers

LayerWhat it tunesStatus
1. ConfigvLLM serving args: batching, caching, parallelism✅ v0.1
2. QuantizationINT4/FP8/INT8, KV cache precision🔧 v0.2
3. Kernel synthesisCustom kernels via AutoKernel🚧 v0.3
Layer 1 has the highest leverage per experiment because it requires no recompilation. Forge escalates to Layer 2 when Layer 1 converges.

Works with any GPU and LLM

Auto-detects NVIDIA and AMD GPUs via nvidia-smi / rocm-smi. Works with any OpenAI-compatible LLM for the agent.
pip install aevyra-forge               # Claude included by default
pip install aevyra-forge[openai]       # add OpenAI / OpenRouter / Together / Groq
ProviderEnv var
AnthropicANTHROPIC_API_KEY
OpenAIOPENAI_API_KEY
OpenRouterOPENROUTER_API_KEY
Ollama
Python 3.10+. Apache-2.0 licensed.

Quick start

Run your first tuning session in 15 minutes

Tutorial

Dry-run walkthrough with real log output

Concepts: Recipe

The artifact Forge proposes, mutates, and keeps or reverts

Concepts: Playbook

The agent’s instruction manual