API documentation

Pricing Rules

Token pricing, cache accounting, image charges, and long-context multipliers.

AIGate calculates cost per request. Token models usually bill input, output, cache read, and cache write separately. Image models can use fixed or dynamic per-request pricing.

Token fields

FieldMeaning
input tokensNormal text input tokens after separate cache/image/audio parts are accounted for.
output tokensGenerated text tokens.
cache readCached prompt tokens read from provider cache.
cache writeTokens written into provider cache. Claude-style 5 minute and 1 hour writes can have different rates.
image input/outputImage tokens or image-generation call charges when a model reports them.

Long-context multipliers

Thresholds use total input context
The threshold is based on full input context length before cache discount. Cached tokens still count toward the threshold.
Provider/model familyThresholdInputOutputCache
OpenAI models272,000 tokens and abovex2x1.5x2
Google Pro models200,000 tokens and abovex2x1.5x2
Grok modelsAbove 200,000 tokensx2x2x2

For example, if a Google Pro request has 210K total input context, its input and cache portions use the long-context input price, and its output uses the long-context output price.

Cache hit percent

Cache hit percent is cache read divided by normal input plus cache read. Cache write is shown separately because writing cache is not a hit.

txt
cache_hit_percent = cache_read_tokens / (input_tokens + cache_read_tokens) * 100