tune
Start a new tuning run or resume an interrupted one.resume reads all parameters from the run’s config.json — no flags needed.
Options
| Flag | Default | Description |
|---|---|---|
--model | (required) | HuggingFace model ID or local path (e.g. Qwen/Qwen2.5-3B, meta-llama/Llama-3.2-1B-Instruct) |
--device | cuda | GPU backend: cuda, rocm, or cpu. cuda and rocm auto-detect GPU name and VRAM via nvidia-smi / rocm-smi. Use cpu with --dry-run |
--workload | (required) | Path to workload JSONL. Each line must have a "prompt" field and optionally "expected_output_tokens" and "arrival_offset_s" |
--concurrency | 8 | Max concurrent in-flight requests during benchmarking. T4/A10: 8–16. A100/H100: 32–64 |
--llm | anthropic/claude-sonnet-4-6 | Agent LLM in provider/model format. Examples: openrouter/meta-llama/llama-3.1-70b, openai/gpt-4o, ollama/qwen3:8b |
--max-experiments | 50 | Total experiment budget across all layers |
--max-hours | 12.0 | Wall-clock time limit in hours |
--max-dollars | — | LLM spend cap in USD |
--accuracy-floor | 0.99 | Minimum acceptable accuracy. Experiments that regress below this are not kept regardless of throughput gains |
--playbook | (bundled) | Path to a custom playbook .md file. Defaults to the bundled playbook.md |
--run-dir | .forge | Root directory for run storage |
--dry-run | false | Skip vLLM; use synthetic bench results. Useful for testing the loop without a GPU |
--verbose | false | Debug logging |
Layer control
| Flag | Default | Description |
|---|---|---|
--skip-config | false | Skip Layer 1 config tuning — go straight to Layer 2 quantization |
--skip-quant | false | Skip Layer 2 quantization |
--skip-kernel | false | Skip Layer 3 kernel synthesis |
--max-config-experiments N | — | Cap Layer 1 at N experiments, then escalate to Layer 2 regardless of convergence. Useful on T4 where the config search space is narrow |
--max-quant-experiments N | — | Cap Layer 2 at N experiments |
Examples
Run directory layout
experiments.jsonl but no completed.json was interrupted and can be resumed with aevyra-forge tune resume.
report
Print a summary of a completed or in-progress run.Arguments
| Argument | Description |
|---|---|
run-dir | Path to a run directory (e.g. .forge/ or .forge/runs/001_2026-05-13T04-10-00) |
Options
| Flag | Default | Description |
|---|---|---|
--format | table | Output format: table or json |
Output
playbook
Inspect the active playbook.Subcommands
| Subcommand | Description |
|---|---|
show | Print the full playbook text to stdout |
validate | Check the playbook’s structure and YAML front-matter. Exits non-zero if invalid |
Options
| Flag | Default | Description |
|---|---|---|
--playbook | (bundled) | Path to a custom playbook file. Defaults to the bundled playbook.md |
Examples
LLM providers
The--llm flag follows a provider/model convention shared across the Aevyra stack:
| Provider | Format | Required env var |
|---|---|---|
| Anthropic (default) | anthropic/claude-sonnet-4-6 | ANTHROPIC_API_KEY |
| OpenAI | openai/gpt-4o | OPENAI_API_KEY |
| OpenRouter | openrouter/meta-llama/llama-3.1-70b | OPENROUTER_API_KEY |
| Together AI | together/meta-llama/Llama-3-70b | TOGETHER_API_KEY |
| Groq | groq/llama3-70b-8192 | GROQ_API_KEY |
| Ollama (local) | ollama/qwen3:8b | — |
| Any OpenAI-compat | openai/model-name + custom base URL | — |