Where Forge fits
Forge operates on the infrastructure layer. Where Reflex rewrites prompts and Origin diagnoses agent failures, Forge maximises throughput and minimises latency for a model that’s already doing the right thing.The loop
Each iteration: the agent reads the playbook and experiment history, proposes one targeted change, and Forge measures whether it actually helps. The audit trail captures every decision — config, result, rationale — so you can see exactly how the winning recipe was found.Tuning layers
| Layer | What it tunes | Status |
|---|---|---|
| 1. Config | vLLM serving args: batching, caching, parallelism | ✅ v0.1 |
| 2. Quantization | INT4/FP8/INT8, KV cache precision | 🔧 v0.2 |
| 3. Kernel synthesis | Custom kernels via AutoKernel | 🚧 v0.3 |
Works with any GPU and LLM
Auto-detects NVIDIA and AMD GPUs vianvidia-smi / rocm-smi. Works with
any OpenAI-compatible LLM for the agent.
| Provider | Env var |
|---|---|
| Anthropic | ANTHROPIC_API_KEY |
| OpenAI | OPENAI_API_KEY |
| OpenRouter | OPENROUTER_API_KEY |
| Ollama | — |
Quick start
Run your first tuning session in 15 minutes
Tutorial
Dry-run walkthrough with real log output
Concepts: Recipe
The artifact Forge proposes, mutates, and keeps or reverts
Concepts: Playbook
The agent’s instruction manual