DevBolt
Processed in your browser. Your data never leaves your device.

How do I count LLM tokens and estimate API costs?

Paste your text and select a model (GPT-4o, Claude, Gemini, Llama, Mistral, DeepSeek) to see the token count and estimated API cost. The tool uses BPE tokenization, shows context window usage, and lets you compare costs across 19 models from 6 providers. Everything runs in your browser.

Count GPT-4 tokens
Input
Hello, how are you doing today?
Output
Tokens: 8
Cost (GPT-4o): $0.00002 input
Context used: 0.006%
← Back to tools

LLM Token Counter & Cost Calculator

Count tokens and estimate API costs for GPT-4o, Claude, Gemini, and other LLMs. Uses BPE tokenization (cl100k_base).

0
Input Tokens
0
Est. Output Tokens
$0.00
Est. Total Cost
0.0%
Context Used (128K)
0
Characters
0
Words
0
Lines

Model Pricing Comparison

ModelInput $/1MOutput $/1MContext
$2.00$8.001M
$0.40$1.601M
$0.10$0.401M
$2.50$10.00128K
$0.15$0.60128K
$2.00$8.00200K
$1.10$4.40200K
$1.10$4.40200K
$15.00$75.00200K
$3.00$15.00200K
$0.80$4.00200K
$1.25$10.001M
$0.15$0.601M
$0.10$0.401M
$0.50$0.501M
$0.20$0.20524K
$2.00$6.00128K
$0.27$1.10131K
$0.55$2.19131K

About Token Counting

This tool uses BPE tokenization (cl100k_base encoding), which is the tokenizer used by GPT-4, GPT-4o, and GPT-4.1 models. Token counts for other providers (Anthropic, Google, Meta) are approximations — typically within 5-15% of actual counts.

Pricing reflects publicly listed API prices as of March 2026. Actual costs may vary with batch pricing, prompt caching, fine-tuned models, or volume discounts. Output tokens are estimated using your selected output:input ratio.

What is a token? Tokens are the basic units LLMs process text in. A token is roughly 3-4 characters or ¾ of a word in English. Code, non-English text, and special characters typically use more tokens per character.

Tips & Best Practices

Pro Tip

System prompts and tool definitions count toward your context window

The context window isn't just for your messages — system prompts, function definitions, and tool schemas consume tokens too. A complex system prompt with 20 tool definitions can use 3,000-5,000 tokens before the conversation even starts.

Common Pitfall

Token count varies dramatically between models for the same text

GPT-4 and Claude use different tokenizers. The same 1,000-word essay might be 1,200 tokens in GPT-4 (cl100k) but 1,400 in Claude. Always count tokens with the specific model's tokenizer — estimates based on 'words ÷ 0.75' are unreliable.

Real-World Example

Estimate API costs before running batch jobs

Processing 10,000 documents through GPT-4o at $2.50/M input tokens and $10/M output tokens adds up fast. Token-count a representative sample, multiply by your dataset size, and calculate costs before running the full batch.

Security Note

Long prompts can be used to hide prompt injection attacks

Attackers embed malicious instructions deep in long documents, hoping the LLM follows them. If you're processing user-submitted content through an LLM, be aware that longer inputs provide more surface area for prompt injection.

Frequently Asked Questions

How do LLM token counters calculate tokens from text?
Token counters use tokenization algorithms (BPE or SentencePiece) that split text into subword units. Common words like 'the' are single tokens while uncommon words are split into fragments. On average, one token is approximately 4 characters or 0.75 English words. DevBolt supports 19 models across 6 providers, each using their own tokenizer, so token counts vary between models for the same input.
How do I estimate API costs for LLM calls?
Paste your prompt and select the target model. DevBolt counts input tokens and calculates cost based on per-token pricing. Input and output tokens are priced differently. Estimate output token count based on expected response length. Total cost = (input tokens x input price) + (output tokens x output price). For batch processing, these per-request costs add up quickly, making accurate estimation essential for budget planning.
Why do different AI models produce different token counts for the same text?
Different models use different tokenizers trained on different datasets, producing different vocabulary sizes. A larger vocabulary generally means fewer tokens per text because more words are single tokens. Code-heavy text tokenizes very differently since some tokenizers have more programming language training data. The practical impact: the same prompt costs different amounts across providers, and context window limits hold different amounts of text.

Related Inspect Tools