Models & Fallbacks

Choosing the right model for each agent is one of the highest-leverage configuration decisions you’ll make. It affects response quality, latency, and cost — sometimes by an order of magnitude.

How models work with AgenFleet

AgenFleet connects your agents to AI models through your own API keys. You register your provider credentials in Settings → Integrations, and token usage is billed directly by your provider to your account — AgenFleet never marks up or resells model tokens.

This means you have full control: use your existing provider contracts, negotiate your own rates, and keep all cost visibility in one place.

Available models

AgenFleet supports multiple AI providers. Register your API keys under Settings → Integrations and assign a provider to each agent. Agents can use different providers — some on Anthropic, others on OpenAI, etc.

Supported providers and models:

Provider	Model ID	Category	Input (per 1M tokens)	Output (per 1M tokens)	Best for
Anthropic	`claude-haiku-3.5`	Economy	$0.80	$4.00	High-volume cron jobs, simple Q&A, routing, classification
Anthropic	`claude-sonnet-4-20250514`	Flagship	$3.00	$15.00	Most production tasks — research, drafting, analysis
Anthropic	`claude-opus-4-20250514`	Flagship	$15.00	$75.00	Complex multi-step reasoning, high-stakes output
OpenAI	`gpt-4o`	Flagship	$2.50	$10.00	Broad capability, strong reasoning
OpenAI	`gpt-4o-mini`	Economy	$0.15	$0.60	Lowest-cost option for simple tasks
OpenAI	`o1`	Reasoning	$15.00	$60.00	Extended reasoning chains, complex logic
OpenAI	`o3-mini`	Reasoning	$1.10	$4.40	Reasoning tasks at lower cost than o1
Google	`gemini-2.0-flash`	Economy	$0.10	$0.40	Ultra-low cost, high-volume classification
Google	`gemini-2.0-pro`	Flagship	$1.25	$10.00	Strong multimodal capability
Mistral	`mistral-large-latest`	Flagship	$2.00	$6.00	European data residency, strong instruction following
Mistral	`codestral-latest`	Code	$0.30	$0.90	Code generation and review tasks

Pricing reflects provider standard rates and may vary based on your agreement with them. Token costs are billed directly by the provider to your account — not by AgenFleet.

Setting the default model

The model is configured in the model.default field of your agent config:

"model": {
  "default": "claude-haiku-3.5"
}

All sessions and cron jobs for this agent will use this model unless overridden.

Fallback chains

A fallback chain is an ordered list of backup models to try if the primary model is unavailable (due to provider outage, rate limiting, or quota exhaustion).

"model": {
  "default": "claude-haiku-3.5",
  "fallbacks": [
    "claude-sonnet-4-20250514"
  ]
}

AgenFleet tries each model in order using your registered API key. If claude-haiku-3.5 fails, it falls back to claude-sonnet-4-20250514. If that also fails, the job errors and you receive an alert.

The fallbacks field takes an array of model IDs, not a single string. A common misconfiguration is passing a string — this will fail validation at startup.

// ✗ Wrong
"fallbacks": "claude-sonnet-4-20250514"

// ✓ Correct
"fallbacks": ["claude-sonnet-4-20250514"]

Cost optimization strategies

Use Haiku for high-frequency cron jobs

If an agent runs a task every hour or every 6 hours, model cost adds up fast. A daily briefing agent running on Haiku vs. Opus can differ by 60x in monthly cost for identical output quality on routine tasks.

Rule of thumb: Haiku for cron, Sonnet for chat, Opus for hard problems.

Right-size the context window

The session.maxContextTokens setting controls how much session history is included per turn. A session with 50,000 tokens of history costs more per turn than one with 10,000. Prune old sessions regularly and set appropriate limits.

"session": {
  "maxContextTokens": 30000
}

Set daily and monthly budgets

Hard limits prevent runaway costs from bugs, prompt injection attempts, or agents stuck in loops:

"limits": {
  "dailyTokenBudget": 200000,
  "monthlyTokenBudget": 3000000,
  "alertThreshold": 0.8
}

When alertThreshold is reached (e.g., 80% of daily budget consumed), you receive a notification. When the budget is exhausted, the agent pauses until the next budget period.

Monitor cost per agent

The Fleet dashboard shows token consumption and estimated cost per agent, per day. Use the Cost Optimizer view to identify agents that are significantly over-budget relative to their output value.

Model IDs reference

Use these exact strings in your configuration:

Model	Config ID	Provider
Claude Haiku 3.5	`claude-haiku-3.5`	anthropic
Claude Sonnet 4	`claude-sonnet-4-20250514`	anthropic
Claude Opus 4	`claude-opus-4-20250514`	anthropic
GPT-4o	`gpt-4o`	openai
GPT-4o mini	`gpt-4o-mini`	openai
o1	`o1`	openai
o3-mini	`o3-mini`	openai
Gemini 2.0 Flash	`gemini-2.0-flash`	google
Gemini 2.0 Pro	`gemini-2.0-pro`	google
Mistral Large	`mistral-large-latest`	mistral
Codestral	`codestral-latest`	mistral

Choosing the right model — decision guide

Use Haiku when:

The task is well-defined and routine (daily briefings, monitoring alerts, classification)
You’re running many cron jobs with the same agent
Output quality is “good enough” and cost is a priority
The task doesn’t require multi-step reasoning

Use Sonnet when:

The agent handles varied, unpredictable requests from humans
Quality matters but you need consistent sub-5-second response times
You’re doing research, analysis, or structured drafting
Haiku’s output wasn’t good enough after testing

Use Opus when:

The task involves complex multi-step reasoning with many interdependencies
The output is high-stakes (legal, financial, medical-adjacent)
You need the best possible quality and cost is secondary
Sonnet failed on the task