A recipe is the complete deployment specification Forge searches over. Every experiment produces exactly one recipe; the best one is written toDocumentation Index
Fetch the complete documentation index at: https://docs.aevyra.ai/llms.txt
Use this file to discover all available pages before exploring further.
best_recipe.yaml at the end of a run.
Structure
Layers
Layer 1 — Config (config:) is the primary search space in v0.
Forge tunes vLLM serving arguments that control batching, memory, and
caching behaviour. These have the highest leverage per experiment
because they require no recompilation.
Layer 2 — Quantization (quant:) is scaffolded but not yet
implemented. In v0.2+ Forge will tune INT4/FP8/INT8 methods and KV
cache precision jointly with Layer 1.
Layer 3 — Kernel synthesis (kernels:) hooks into AutoKernel for
custom op synthesis. Planned for v0.3+.
Key VLLMConfig fields
| Field | vLLM default | What it does |
|---|---|---|
max_num_seqs | 256 | Max concurrent sequences in a batch |
max_num_batched_tokens | 8192 | Max tokens processed per forward pass |
enable_prefix_caching | false | Cache KV state for repeated prefixes |
enable_chunked_prefill | true | Break long prefills into chunks |
gpu_memory_utilization | 0.9 | Fraction of GPU VRAM for KV cache |
kv_cache_dtype | auto | KV cache precision (auto/fp8/fp16/bf16) |
tensor_parallel_size | 1 | Number of GPUs for tensor parallelism |
Lineage
Each recipe records itsparent_id and generation. This lets Forge
detect convergence, build a diff between any two recipes, and render
a clean audit trail.