aevyra-forge tunes your vLLM deployment overnight. Give it a model, a GPU, and a workload trace. It runs an autonomous loop — propose a config, boot the server, benchmark against your real workload, keep or revert, repeat. By morning you have a deployment recipe that beats hand-tuned defaults, with a full audit trail of every experiment.Documentation Index
Fetch the complete documentation index at: https://docs.aevyra.ai/llms.txt
Use this file to discover all available pages before exploring further.
Where Forge fits
Forge operates on the infrastructure layer. Where Reflex rewrites prompts and Origin diagnoses agent failures, Forge maximises throughput and minimises latency for a model that’s already doing the right thing.The loop
Each iteration: the agent reads the playbook and experiment history, proposes one targeted change, and Forge measures whether it actually helps. The audit trail captures every decision — config, result, rationale — so you can see exactly how the winning recipe was found.Tuning layers
| Layer | What it tunes | Status |
|---|---|---|
| 1. Config | vLLM serving args: batching, caching, parallelism | ✅ v0.1 |
| 2. Quantization | INT4/FP8/INT8, KV cache precision | 🔧 v0.2 |
| 3. Kernel synthesis | Custom kernels via AutoKernel | 🚧 v0.3 |
Works with any GPU and LLM
Auto-detects NVIDIA and AMD GPUs vianvidia-smi / rocm-smi. Works with
any OpenAI-compatible LLM for the agent.
| Provider | Env var |
|---|---|
| Anthropic | ANTHROPIC_API_KEY |
| OpenAI | OPENAI_API_KEY |
| OpenRouter | OPENROUTER_API_KEY |
| Ollama | — |
Quick start
Run your first tuning session in 15 minutes
Tutorial
Dry-run walkthrough with real log output
Concepts: Recipe
The artifact Forge proposes, mutates, and keeps or reverts
Concepts: Playbook
The agent’s instruction manual