API documentation
Pricing Rules
Token pricing, cache accounting, image charges, and long-context multipliers.
AIGate calculates cost per request. Token models usually bill input, output, cache read, and cache write separately. Image models can use fixed or dynamic per-request pricing.
Token fields
| Field | Meaning |
|---|---|
| input tokens | Normal text input tokens after separate cache/image/audio parts are accounted for. |
| output tokens | Generated text tokens. |
| cache read | Cached prompt tokens read from provider cache. |
| cache write | Tokens written into provider cache. Claude-style 5 minute and 1 hour writes can have different rates. |
| image input/output | Image tokens or image-generation call charges when a model reports them. |
Long-context multipliers
Thresholds use total input context
The threshold is based on full input context length before cache discount. Cached tokens still count toward the threshold.| Provider/model family | Threshold | Input | Output | Cache |
|---|---|---|---|---|
| OpenAI models | 272,000 tokens and above | x2 | x1.5 | x2 |
| Google Pro models | 200,000 tokens and above | x2 | x1.5 | x2 |
| Grok models | Above 200,000 tokens | x2 | x2 | x2 |
For example, if a Google Pro request has 210K total input context, its input and cache portions use the long-context input price, and its output uses the long-context output price.
Cache hit percent
Cache hit percent is cache read divided by normal input plus cache read. Cache write is shown separately because writing cache is not a hit.
txt
cache_hit_percent = cache_read_tokens / (input_tokens + cache_read_tokens) * 100