Skip to content

Models & Fallbacks

Choosing the right model for each agent is one of the highest-leverage configuration decisions you’ll make. It affects response quality, latency, and cost — sometimes by an order of magnitude.


AgenFleet connects your agents to AI models through your own API keys. You register your provider credentials in Settings → Integrations, and token usage is billed directly by your provider to your account — AgenFleet never marks up or resells model tokens.

This means you have full control: use your existing provider contracts, negotiate your own rates, and keep all cost visibility in one place.


AgenFleet supports multiple AI providers. Register your API keys under Settings → Integrations and assign a provider to each agent. Agents can use different providers — some on Anthropic, others on OpenAI, etc.

Supported providers and models:

ProviderModel IDCategoryInput (per 1M tokens)Output (per 1M tokens)Best for
Anthropicclaude-haiku-3.5Economy$0.80$4.00High-volume cron jobs, simple Q&A, routing, classification
Anthropicclaude-sonnet-4-20250514Flagship$3.00$15.00Most production tasks — research, drafting, analysis
Anthropicclaude-opus-4-20250514Flagship$15.00$75.00Complex multi-step reasoning, high-stakes output
OpenAIgpt-4oFlagship$2.50$10.00Broad capability, strong reasoning
OpenAIgpt-4o-miniEconomy$0.15$0.60Lowest-cost option for simple tasks
OpenAIo1Reasoning$15.00$60.00Extended reasoning chains, complex logic
OpenAIo3-miniReasoning$1.10$4.40Reasoning tasks at lower cost than o1
Googlegemini-2.0-flashEconomy$0.10$0.40Ultra-low cost, high-volume classification
Googlegemini-2.0-proFlagship$1.25$10.00Strong multimodal capability
Mistralmistral-large-latestFlagship$2.00$6.00European data residency, strong instruction following
Mistralcodestral-latestCode$0.30$0.90Code generation and review tasks

Pricing reflects provider standard rates and may vary based on your agreement with them. Token costs are billed directly by the provider to your account — not by AgenFleet.


The model is configured in the model.default field of your agent config:

"model": {
"default": "claude-haiku-3.5"
}

All sessions and cron jobs for this agent will use this model unless overridden.


A fallback chain is an ordered list of backup models to try if the primary model is unavailable (due to provider outage, rate limiting, or quota exhaustion).

"model": {
"default": "claude-haiku-3.5",
"fallbacks": [
"claude-sonnet-4-20250514"
]
}

AgenFleet tries each model in order using your registered API key. If claude-haiku-3.5 fails, it falls back to claude-sonnet-4-20250514. If that also fails, the job errors and you receive an alert.


If an agent runs a task every hour or every 6 hours, model cost adds up fast. A daily briefing agent running on Haiku vs. Opus can differ by 60x in monthly cost for identical output quality on routine tasks.

Rule of thumb: Haiku for cron, Sonnet for chat, Opus for hard problems.

The session.maxContextTokens setting controls how much session history is included per turn. A session with 50,000 tokens of history costs more per turn than one with 10,000. Prune old sessions regularly and set appropriate limits.

"session": {
"maxContextTokens": 30000
}

Hard limits prevent runaway costs from bugs, prompt injection attempts, or agents stuck in loops:

"limits": {
"dailyTokenBudget": 200000,
"monthlyTokenBudget": 3000000,
"alertThreshold": 0.8
}

When alertThreshold is reached (e.g., 80% of daily budget consumed), you receive a notification. When the budget is exhausted, the agent pauses until the next budget period.

The Fleet dashboard shows token consumption and estimated cost per agent, per day. Use the Cost Optimizer view to identify agents that are significantly over-budget relative to their output value.


Use these exact strings in your configuration:

ModelConfig IDProvider
Claude Haiku 3.5claude-haiku-3.5anthropic
Claude Sonnet 4claude-sonnet-4-20250514anthropic
Claude Opus 4claude-opus-4-20250514anthropic
GPT-4ogpt-4oopenai
GPT-4o minigpt-4o-miniopenai
o1o1openai
o3-minio3-miniopenai
Gemini 2.0 Flashgemini-2.0-flashgoogle
Gemini 2.0 Progemini-2.0-progoogle
Mistral Largemistral-large-latestmistral
Codestralcodestral-latestmistral

Choosing the right model — decision guide

Section titled “Choosing the right model — decision guide”

Use Haiku when:

  • The task is well-defined and routine (daily briefings, monitoring alerts, classification)
  • You’re running many cron jobs with the same agent
  • Output quality is “good enough” and cost is a priority
  • The task doesn’t require multi-step reasoning

Use Sonnet when:

  • The agent handles varied, unpredictable requests from humans
  • Quality matters but you need consistent sub-5-second response times
  • You’re doing research, analysis, or structured drafting
  • Haiku’s output wasn’t good enough after testing

Use Opus when:

  • The task involves complex multi-step reasoning with many interdependencies
  • The output is high-stakes (legal, financial, medical-adjacent)
  • You need the best possible quality and cost is secondary
  • Sonnet failed on the task