Retrieval-Augmented Generation (RAG)

RAG is a pattern that retrieves passages from a private knowledge base before an LLM generates an answer, so responses cite your own documents instead of fabricating facts. In ONOXIA, RAG is the default answering mode for every site.

Purpose

Generic LLMs hallucinate when asked about proprietary content. RAG anchors each answer in a verifiable source you control, which is what makes a chat widget trustworthy enough to deploy on a public site without supervision.

Scope

RAG applies to every visitor question that arrives at the ONOXIA widget. It runs before model inference, on every conversation turn, regardless of language. It does not apply to small-talk or off-topic messages — those are handled by the persona's fallback behaviour.

Components

Ingestion — PDFs, FAQ pairs, URLs and plain text are split into overlapping chunks (~500 tokens).
Embedding — each chunk is encoded as a vector by a multilingual embedding model.
Store — vectors live in a Qdrant index, scoped per site.
Retrieval — at query time the visitor's question is embedded, the top-k nearest chunks are fetched, and they are inserted into the prompt.
Generation — Mistral AI (Europe) or Qwen/Gemini (Asia-Pacific) produces the final answer constrained by the retrieved context.

Outputs

A grounded answer in the visitor's language (28 supported).
An internal trace showing which chunks were retrieved, available in the dashboard for audit.
Token consumption metered against the site's monthly budget.

Relationships

RAG is implemented by ONOXIA's SoftwareApplication, configured per site by a Persona, and routed across providers via Multi-LLM-Routing. When RAG cannot answer with high enough confidence, the bot may trigger Human-Handover.

Authority

Defined by OCENOX LTD as the canonical retrieval pattern used across the ONOXIA platform.

Version

1.0 — 2026-05-22