Multi-LLM Language Routing

Multi-LLM Language Routing is a request-time decision that picks the inference provider best suited for the visitor's language pair — Mistral AI in Paris for European languages (GDPR-compliant), Qwen for Chinese and other Asia-Pacific languages, Gemini Flash as a global fallback.

Purpose

No single LLM is best in every language. Mistral excels in European languages and ships EU data residency. Qwen leads on Chinese, Thai, and several Asian languages. Gemini Flash is reliable and cheap as a universal fallback. Routing matches the language to the model so every visitor gets a native-quality reply.

Scope

Applies to every generation call made by the ONOXIA bot worker — RAG answers, fallback replies, email-agent drafts. Does not apply to embedding (a separate multilingual encoder is used for the entire corpus regardless of routing).

Components

Detection — the visitor's first message determines the source language.
Routing table — (source, target) → provider mapping, cached in LlmModel::Cache::remember.
Provider — Mistral, Qwen, or Gemini, each with its own API client, retry policy and rate-limit budget.
Fallback — on provider error or quota exhaustion, the next provider in the priority list is called.

Outputs

A native-feeling answer for any of the 28 supported bot languages.
Per-language token accounting fed into Token-FIFO Pricing.
A provider-uptime view in the dashboard.

Relationships

Multi-LLM Language Routing produces inference for RAG answers and is metered by Token-FIFO Pricing.

Authority

Defined by OCENOX LTD.

Version

1.0 — 2026-05-22