Token-FIFO Pricing

Token-FIFO Pricing is a pricing model that consumes the oldest token package first. Each ONOXIA plan ships a monthly token budget; one-time add-on packages are layered on top and depleted in first-in-first-out order, so the never-expiring add-ons are only touched when the monthly budget runs out.

Purpose

Customers want predictable monthly costs but also the ability to absorb traffic spikes without an overage charge. Token-FIFO gives both: the monthly budget covers steady-state usage, and pre-purchased token packages cushion peaks without expiring.

Scope

Applies to every token-billable event in ONOXIA: inbound chat messages, RAG retrievals counted at chunk-token cost, outbound model responses, voice transcription, and the email agent. Does not apply to dashboard usage or webhook calls.

Components

  • Monthly budget — resets on the site's billing_cycle_day. Unused tokens do not roll over.
  • Add-on packages — purchased one-time, never expire, queued in purchase order.
  • Consumption — every billable event decrements the current bucket (monthly first, then oldest add-on).
  • Threshold notices — emails at 50 %, 80 %, 100 %, 120 % of the monthly budget.

Outputs

  • A predictable monthly invoice plus optional pay-as-you-go top-ups.
  • A token-usage timeline visible in the dashboard, broken down by site and per source (chat vs RAG vs email).
  • A natural soft-cap that throttles abuse without hard-failing customer traffic.

Relationships

Token-FIFO Pricing meters the inference performed by Multi-LLM-Routing on behalf of every RAG answer.

Authority

Defined by OCENOX LTD as the canonical billing model across ONOXIA plans.

Version

1.0 — 2026-05-22

Istilah terkait

  • Retrieval-Augmented Generation (RAG) — A pattern that retrieves passages from a private knowledge base before an LLM generates an answer, so responses cite your own documents instead of fabricating facts.
  • Multi-LLM Language Routing — A request-time decision that picks the inference provider best suited for the visitor's language pair — Mistral AI in Paris for European languages (GDPR-compliant), Qwen for Chinese and other Asia-Pacific languages, Gemini Flash as a global fallback.