Learn AI
    navigate Enter open Esc close Open with K or /

    5 min

    Tokens, models, and what it actually costs

    How AI is priced, where your money goes, and how to cap your spend before it surprises you.

    AI assistants are not free. Chat apps come with monthly subscriptions; APIs and agents charge per use. Three concepts let you predict and control your bill.

    Tokens — what you're actually paying for

    Models don't think in words; they think in tokens. A token is roughly a word or a chunk of one. "Hello" is one token. "Onomatopoeia" is four. English text is ~0.75 words per token. Code averages ~3 characters per token.

    Every request charges you for two things:

    • Input tokens. Your prompt + any history + system prompt + files.
    • Output tokens. The reply the model writes back.

    Output is usually 4–5× more expensive than input per token.

    The three pricing tiers

    TierUse forRough cost per 1M tokens (in / out)
    Cheap & fast
    Haiku, GPT-mini, Gemini Flash
    Routine work, classification, simple lookups, summarisation $0.25–1 / $1–4
    Balanced
    Sonnet, GPT, Gemini Pro
    Default for coding CLIs, agents, writing $3–5 / $15–20
    Heavy reasoning
    Opus, GPT-large, Gemini Ultra
    Architecture, research, gnarly debugging $15 / $75

    Pricing moves fast — these are 2026 ballparks. Check each provider's pricing page for current numbers.

    What workflows actually cost

    One chat reply

    ~500 tokens in, ~500 out

    $0.001–0.01

    A coding-CLI session

    30 min, lots of file reads

    $0.50–3

    Cloud agent (one PR)

    Reads repo, edits files, runs tests

    $1–10

    Background daemon (1 day)

    Scheduled tasks every hour

    $5–50

    Three ways to spend less

    1. Prompt caching. If you send the same context (system prompt, project files) repeatedly, modern APIs cache it — sometimes 80%+ off on repeated input. Coding CLIs do this automatically; verify by checking your usage dashboard.
    2. Model routing. Use the cheap tier for routine work, escalate to balanced for hard problems, reserve heavy reasoning for the rare gnarly thing. A good tool routes for you; otherwise pick deliberately.
    3. Hard spend caps. Every provider's dashboard lets you set a monthly limit. Set one before you start. Treat it as a smoke alarm.

    Where to check your spend

    ProviderWhere
    Anthropicconsole.anthropic.com → Usage
    OpenAIplatform.openai.com → Usage
    Googleaistudio.google.com → Billing
    OpenRouteropenrouter.ai/activity — per model, per request
    The math you actually need. Chat: pennies per message. Coding CLI: dollars per session. Always-on agent: tens of dollars per day. Set a monthly cap that's uncomfortable if hit — so an accidental runaway never costs you more than that.
    Why is output so much more expensive than input?
    Inputs can be batched and processed in parallel; outputs are generated one token at a time and require the model's full attention. The economics of GPU compute reflect that asymmetry.
    What about flat-rate subscriptions (Claude.ai Pro, ChatGPT Plus)?
    Different model. You pay a fixed monthly fee for human-scale chat usage; you don't see token counts. Cheaper if you're a heavy individual user; the API is cheaper for programmatic / agent use.
    What about local / open models?
    Free per token after you pay for hardware. Llama, Mistral, Qwen, DeepSeek all have downloadable weights. Quality is closing the gap but still lags the frontier closed models on hard tasks. Privacy and zero-marginal-cost are the wins.