Skip to content
Nibiru docsv0.9.2

RAG plugin

Ingest text, embed it, retrieve top-K, and answer grounded questions — all in one PHP class.

Stable Reading time ~ 3 min Edit on GitHub

The RAG plugin is the AI module’s killer feature for product builders. It turns any pile of text — your help docs, your error logs, your Stripe invoices, your customer-support tickets — into a queryable knowledge base in roughly four lines of PHP.

use Nibiru\Module\Ai\Ai;
$ai = new Ai();
$rag = $ai->rag('product-help'); / a named collection
$rag->ingestDir(__DIR__ . '/help/'); / walks .md/.txt/.php under help/
$rag->ingestText('FAQ entry…', ['source' => 'faq-12']);
echo $rag->ask('How do I cancel my subscription?');
// → grounded answer, citing chunks like [1] [2] [3]

That’s it. No vector DB. No SDK. No Python sidecar.

ingestText / ingestFile / ingestDir
chunk → embed (Ollama nomic-embed-text)
pack vectors → JSON file at cache/rag/<collection>.json
ask(question) → embed question → cosine top-K → chat with chunks as context

Storage is one JSON file per collection. Each chunk is an object with text + metadata; vectors are base64-packed Float32Array — about 3 KB per chunk. ~10k chunks fits comfortably in memory.

You can have any number of collections in the same app. Each has its own JSON file. They share embedding model and chat model from [AI] config.

$docs = $ai->rag('docs');
$tickets = $ai->rag('support-tickets');
$logs = $ai->rag('error-logs');
$docs->ingestDir(__DIR__ . '/help/');
$tickets->ingestText($ticket->body, ['ticket_id' => $ticket->id]);
$logs->ingestText($exception->__toString(), ['ts' => time()]);
$rag = $ai->rag('name'); / get/create a named collection
// --- Ingestion ---
$rag->ingestText($text, $metadata = []); / single chunk
$count = $rag->ingestFile('path'); / returns chunks added
$count = $rag->ingestDir('dir', ['md','txt','php']); / recursive
// --- Querying ---
$hits = $rag->search('query', $k = null); / [{score, text, metadata}, ]
$answer = $rag->ask('question', $k = null); / top-K chat call
// --- Maintenance ---
$rag->reset(); / forget everything (deletes file)
$n = $rag->size(); / number of chunks

In application/module/ai/settings/ai.ini:

[AI]
embed.model = "nomic-embed-text" ; or mxbai-embed-large for higher quality
rag.top_k = 6 ; chunks injected into the chat call
rag.chunk_target = 600 ; tokens per chunk (target)
rag.chunk_min = 120 ; smaller chunks merged
rag.chunk_max = 900 ; larger paragraphs split on sentences
rag.storage_path = "/../../application/module/ai/cache/rag/"
  • Help / FAQ chat — ingest your help articles, expose a /ask endpoint.
  • In-app code search — ingest application/module/, ask “where do we calculate VAT?”
  • Internal docs assistant — ingest your team’s wiki dump.
  • Customer-history lookups — ingest tickets, ask “have we seen this error before?”
  • Real-time, write-heavy data — RAG is a snapshot. For live data, write a Tool the agent can call.
  • Massive corpora (> 100k chunks) — JSON-file storage starts to creak. Move to Qdrant / pgvector / Weaviate; we’ll publish an adapter once we need one ourselves.
  • Anything where you need exact answers, not probable ones. RAG is probabilistic. Don’t use it as a database query layer.
  • nomic-embed-text not pulled. The first ingestText call will fail with a clear error pointing you at the pull command.
  • Embedding model mismatch. Don’t mix nomic-embed-text chunks with mxbai-embed-large queries — different vector spaces. If you change embed.model, run $rag->reset() first.
  • Stale collections. Re-running ingestDir doesn’t dedupe. Use reset() then re-ingest, or maintain a content-hash check yourself.
  • Tiny chunks. Below ~80 tokens, embeddings get noisy. The default rag.chunk_min = 120 merges small adjacent chunks.