Skip to content

Cost Management

AgenFleet gives you visibility and control over AI costs at every level — per agent, per job, and across the entire fleet. Token usage is billed directly by your AI provider (e.g., Anthropic) to your account — AgenFleet tracks consumption so you can optimize it, but does not charge for tokens itself.

This page covers how to monitor token usage, set guardrails, and reduce spend without sacrificing output quality.


Every interaction with an agent consumes tokens — the unit of measure for LLM usage. Costs are based on:

  • Input tokens — everything sent to the model: the system prompt (SOUL file), session history, memory results, and the current message
  • Output tokens — the model’s response

Each model has a different per-token price. See Models & Fallbacks for current rates.

What drives high cost:

  • Long sessions with extensive history (high input token count per turn)
  • Frequent cron jobs on expensive models (Sonnet/Opus instead of Haiku)
  • Agents with large SOUL files or many extraPaths files loaded per turn
  • High topK memory search results injected per turn

Every agent detail view shows:

  • Today’s token usage — input + output, with cost estimate
  • 7-day chart — daily spend trend
  • Monthly projection — estimated full-month cost at current run rate
  • Cost per session — average tokens per conversation turn

The Fleet dashboard header aggregates across all agents:

  • Combined daily token consumption
  • Estimated monthly cost
  • Top spenders (agents ranked by token consumption)

The Activity tab logs token usage for every cron job execution and session turn. You can see exactly how much a specific briefing or report cost to generate.


Budgets are configured in the agent’s limits block:

"limits": {
"dailyTokenBudget": 300000,
"monthlyTokenBudget": 6000000,
"alertThreshold": 0.8
}

How limits work:

  • When alertThreshold is reached (e.g., 80% of daily budget), an alert is sent to your notification channel
  • When the daily budget is exhausted, the agent pauses — it will not process new messages or cron jobs for the rest of the day
  • Budget resets at midnight UTC (daily) and on the 1st of each month (monthly)
  • Paused agents resume automatically when the budget period resets

The Cost Optimizer is accessible from the Fleet dashboard sidebar. It analyzes your fleet and surfaces specific recommendations to reduce spend:

High-cost agents on expensive models — agents running Sonnet or Opus that could potentially run on Haiku for their task type. The optimizer flags these with an estimated monthly savings if downgraded.

Low-activity agents — agents with minimal task volume relative to their standing costs. Candidates for consolidation or deactivation.

Session bloat — agents with sessions that have accumulated thousands of messages. Each turn on a bloated session costs more due to larger context window usage. The optimizer shows how much you’d save by pruning them.

High topK memory settings — agents injecting 10+ memory results per turn when 3–5 is typically sufficient.

Each recommendation in the Cost Optimizer is actionable — clicking Apply makes the change directly. Changes to model and topK take effect on the next session turn. Session pruning is immediate.

All changes are logged in the activity audit trail.


The single highest-leverage decision. A daily news briefing on Haiku vs. Opus is the same quality output at ~60x lower cost for this type of routine task.

Task typeRecommended model
Daily/weekly briefingsHaiku
Research and analysisSonnet
Complex multi-step reasoningOpus
Simple monitoring/alertingHaiku
Client-facing chatSonnet

Set a monthly calendar reminder to prune sessions older than 30 days. This is the easiest recurring action to reduce cost and improve agent responsiveness simultaneously.

If an agent rarely needs historical context (e.g., a daily news briefing that’s always fresh), set topK to 2–3 instead of the default 5. Each injected memory result adds input tokens.

A 3,000-word SOUL file costs more per turn than a 800-word one. Move static reference material (data tables, glossaries) to extraPaths with selective injection rather than loading it every turn.

If you have two agents each running 2 cron jobs per week, consider merging them into one agent with 4 jobs. You save the standing overhead of a second container and a second set of session context.


Because token costs flow directly to your provider account, AgenFleet’s billing dashboard gives you a full breakdown to help you understand and attribute spend — but charges appear on your Anthropic (or other provider) invoice, not your AgenFleet invoice.

The usage breakdown includes:

  • Token usage by agent
  • Model mix (Haiku / Sonnet / Opus split)
  • Total input vs. output tokens
  • Estimated cost at current provider rates

Your AgenFleet subscription covers platform access, infrastructure, fleet management, and support. See your plan details under Settings → Billing.