Ask the Oracle

How the in-site AI assistant works — RAG over the docs, served by your own Ollama on your Ollama instance.

Stable Reading time ~ 3 min Edit on GitHub

The amber button in the corner of every page is the Nibiru Oracle — an AI assistant grounded in this very documentation. Ask it about routing, modules, the CLI, the Smarty layer, the meaning of pageAction(). It cites its sources.

What’s powering it

By default, the Oracle runs entirely on your own infrastructure:

Layer	Backend	Default model
Chat (answer generation)	Ollama on `https://your-ollama-host.example`	`qwen2.5-coder:14b`
Embeddings (RAG retrieval)	Ollama on `https://your-ollama-host.example`	`nomic-embed-text`

No paid API keys. No data leaves your network. The 5-GPU Ollama cluster you already run handles the load.

If you’d rather use a paid provider — Claude for chat, OpenAI for embeddings — set LLM_PROVIDER=anthropic and/or EMBED_PROVIDER=openai plus the matching API keys. The code paths are identical.

How it works

flowchart LR
  A[User question] --> B[Embed via Ollama<br/>nomic-embed-text]
  B --> C[Cosine search<br/>against pre-computed<br/>doc-chunk index]
  C --> D[Top-K chunks]
  D --> E[Ollama chat<br/>qwen2.5-coder:14b<br/>system + retrieved context]
  E --> F[Answer + source list]
  F --> G[Render in chat UI]

At build time, the docs site walks every Markdown page, splits it into ~600-token chunks at H2/H3 boundaries, embeds each chunk with nomic-embed-text, and writes the result to public/oracle-index.json. No database needed.
At request time, the user’s question is embedded the same way, the closest chunks are retrieved by cosine similarity, and they’re stitched into a system prompt for the chat model.
The chat model answers in the user’s language, citing source chunks by URL.

Files

File	Purpose
`scripts/lib/providers.mjs`	Shared chat + embedding adapter (Ollama / Anthropic / OpenAI).
`scripts/build-oracle-index.mjs`	Builds `public/oracle-index.json` at build time.
`public/oracle-index.json`	The committed/build-output embedding index.
`src/pages/api/oracle.ts`	The SSR endpoint the chat widget POSTs to. Also serves a GET for diagnostics.
`src/components/CosmicHeader.astro`	The floating launcher + chat UI.

One-time setup on your Ollama instance

Pull the two models the Oracle uses:

curl https://your-ollama-host.example/api/pull -d '{"name":"qwen2.5-coder:14b"}'
curl https://your-ollama-host.example/api/pull -d '{"name":"nomic-embed-text"}'

qwen2.5-coder:14b is already installed (verified live). nomic-embed-text is the missing piece; without it the Oracle runs in chat-only (no-RAG) mode.

Configuring it

The Oracle reads its config from environment variables. Sensible defaults are baked in.

# Default mode (Ollama on your Ollama instance)
LLM_PROVIDER=ollama                                 # default
OLLAMA_BASE_URL=https://your-ollama-host.example            # default
OLLAMA_CHAT_MODEL=qwen2.5-coder:14b                 # default
OLLAMA_EMBED_MODEL=nomic-embed-text                 # default

# Optional fallbacks
LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-haiku-4-5-20251001

EMBED_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_EMBED_MODEL=text-embedding-3-small

# Behaviour
ORACLE_TOP_K=6
ORACLE_MAX_TOKENS=800

Diagnostics endpoint

GET /api/oracle returns the current config (no secrets):

curl https://nibiru-framework.com/api/oracle
{
  "status": "ok",
  "llm":   { "provider": "ollama", "ollamaUrl": "https://your-ollama-host.example",
             "model": "qwen2.5-coder:14b" },
  "embed": { "provider": "ollama", "ollamaUrl": "https://your-ollama-host.example",
             "model": "nomic-embed-text" },
  "index": { "present": true, "chunks": 177,
             "provider": "ollama", "model": "nomic-embed-text" }
}

Handy for verifying a freshly-deployed container is using the backend you expected.

Privacy

Questions and conversation history are sent to your Ollama server. They are not stored by the docs site or by Anthropic/OpenAI in the default Ollama config.
The OpenAI key (if used) is invoked only for embeddings.
No analytics or cookies are set by the Oracle widget itself.

Why a Nibiru-trained model?

The roadmap (see AI Roadmap) is to fine-tune a LoRA on the Training Corpus export so the chat model itself is Nibiru-native. When that’s ready, the Oracle’s OLLAMA_CHAT_MODEL flips to the fine-tuned model and the system prompt simplifies. Same code, smarter answers.

Try it

Open the Oracle (the amber planet, bottom-right) and try one of these:

“How do I create a new module?”
“What does pageAction do?”
“Show me how to handle a JSON endpoint.”
“Wie schreibe ich eine Migration?” (German works.)
“認証フローを教えて” (Japanese works.)