Models & Fallbacks
Choosing the right model for each agent is one of the highest-leverage configuration decisions you’ll make. It affects response quality, latency, and cost — sometimes by an order of magnitude.
How models work with AgenFleet
Section titled “How models work with AgenFleet”AgenFleet connects your agents to AI models through your own API keys. You register your provider credentials in Settings → Integrations, and token usage is billed directly by your provider to your account — AgenFleet never marks up or resells model tokens.
This means you have full control: use your existing provider contracts, negotiate your own rates, and keep all cost visibility in one place.
Available models
Section titled “Available models”AgenFleet supports multiple AI providers. Register your API keys under Settings → Integrations and assign a provider to each agent. Agents can use different providers — some on Anthropic, others on OpenAI, etc.
Supported providers and models:
| Provider | Model ID | Category | Input (per 1M tokens) | Output (per 1M tokens) | Best for |
|---|---|---|---|---|---|
| Anthropic | claude-haiku-3.5 | Economy | $0.80 | $4.00 | High-volume cron jobs, simple Q&A, routing, classification |
| Anthropic | claude-sonnet-4-20250514 | Flagship | $3.00 | $15.00 | Most production tasks — research, drafting, analysis |
| Anthropic | claude-opus-4-20250514 | Flagship | $15.00 | $75.00 | Complex multi-step reasoning, high-stakes output |
| OpenAI | gpt-4o | Flagship | $2.50 | $10.00 | Broad capability, strong reasoning |
| OpenAI | gpt-4o-mini | Economy | $0.15 | $0.60 | Lowest-cost option for simple tasks |
| OpenAI | o1 | Reasoning | $15.00 | $60.00 | Extended reasoning chains, complex logic |
| OpenAI | o3-mini | Reasoning | $1.10 | $4.40 | Reasoning tasks at lower cost than o1 |
gemini-2.0-flash | Economy | $0.10 | $0.40 | Ultra-low cost, high-volume classification | |
gemini-2.0-pro | Flagship | $1.25 | $10.00 | Strong multimodal capability | |
| Mistral | mistral-large-latest | Flagship | $2.00 | $6.00 | European data residency, strong instruction following |
| Mistral | codestral-latest | Code | $0.30 | $0.90 | Code generation and review tasks |
Pricing reflects provider standard rates and may vary based on your agreement with them. Token costs are billed directly by the provider to your account — not by AgenFleet.
Setting the default model
Section titled “Setting the default model”The model is configured in the model.default field of your agent config:
"model": { "default": "claude-haiku-3.5"}All sessions and cron jobs for this agent will use this model unless overridden.
Fallback chains
Section titled “Fallback chains”A fallback chain is an ordered list of backup models to try if the primary model is unavailable (due to provider outage, rate limiting, or quota exhaustion).
"model": { "default": "claude-haiku-3.5", "fallbacks": [ "claude-sonnet-4-20250514" ]}AgenFleet tries each model in order using your registered API key. If claude-haiku-3.5 fails, it falls back to claude-sonnet-4-20250514. If that also fails, the job errors and you receive an alert.
Cost optimization strategies
Section titled “Cost optimization strategies”Use Haiku for high-frequency cron jobs
Section titled “Use Haiku for high-frequency cron jobs”If an agent runs a task every hour or every 6 hours, model cost adds up fast. A daily briefing agent running on Haiku vs. Opus can differ by 60x in monthly cost for identical output quality on routine tasks.
Rule of thumb: Haiku for cron, Sonnet for chat, Opus for hard problems.
Right-size the context window
Section titled “Right-size the context window”The session.maxContextTokens setting controls how much session history is included per turn. A session with 50,000 tokens of history costs more per turn than one with 10,000. Prune old sessions regularly and set appropriate limits.
"session": { "maxContextTokens": 30000}Set daily and monthly budgets
Section titled “Set daily and monthly budgets”Hard limits prevent runaway costs from bugs, prompt injection attempts, or agents stuck in loops:
"limits": { "dailyTokenBudget": 200000, "monthlyTokenBudget": 3000000, "alertThreshold": 0.8}When alertThreshold is reached (e.g., 80% of daily budget consumed), you receive a notification. When the budget is exhausted, the agent pauses until the next budget period.
Monitor cost per agent
Section titled “Monitor cost per agent”The Fleet dashboard shows token consumption and estimated cost per agent, per day. Use the Cost Optimizer view to identify agents that are significantly over-budget relative to their output value.
Model IDs reference
Section titled “Model IDs reference”Use these exact strings in your configuration:
| Model | Config ID | Provider |
|---|---|---|
| Claude Haiku 3.5 | claude-haiku-3.5 | anthropic |
| Claude Sonnet 4 | claude-sonnet-4-20250514 | anthropic |
| Claude Opus 4 | claude-opus-4-20250514 | anthropic |
| GPT-4o | gpt-4o | openai |
| GPT-4o mini | gpt-4o-mini | openai |
| o1 | o1 | openai |
| o3-mini | o3-mini | openai |
| Gemini 2.0 Flash | gemini-2.0-flash | |
| Gemini 2.0 Pro | gemini-2.0-pro | |
| Mistral Large | mistral-large-latest | mistral |
| Codestral | codestral-latest | mistral |
Choosing the right model — decision guide
Section titled “Choosing the right model — decision guide”Use Haiku when:
- The task is well-defined and routine (daily briefings, monitoring alerts, classification)
- You’re running many cron jobs with the same agent
- Output quality is “good enough” and cost is a priority
- The task doesn’t require multi-step reasoning
Use Sonnet when:
- The agent handles varied, unpredictable requests from humans
- Quality matters but you need consistent sub-5-second response times
- You’re doing research, analysis, or structured drafting
- Haiku’s output wasn’t good enough after testing
Use Opus when:
- The task involves complex multi-step reasoning with many interdependencies
- The output is high-stakes (legal, financial, medical-adjacent)
- You need the best possible quality and cost is secondary
- Sonnet failed on the task