RAG plugin

Ingest text, embed it, retrieve top-K, and answer grounded questions — all in one PHP class.

Stable Reading time ~ 3 min Edit on GitHub

The RAG plugin is the AI module’s killer feature for product builders. It turns any pile of text — your help docs, your error logs, your Stripe invoices, your customer-support tickets — into a queryable knowledge base in roughly four lines of PHP.

Three minutes, end-to-end

use Nibiru\Module\Ai\Ai;

$ai  = new Ai();
$rag = $ai->rag('product-help');     / a named collection

$rag->ingestDir(__DIR__ . '/help/'); / walks .md/.txt/.php under help/
$rag->ingestText('FAQ entry…', ['source' => 'faq-12']);

echo $rag->ask('How do I cancel my subscription?');
// → grounded answer, citing chunks like [1] [2] [3]

That’s it. No vector DB. No SDK. No Python sidecar.

How it works

ingestText / ingestFile / ingestDir
        ↓
   chunk → embed (Ollama nomic-embed-text)
        ↓
   pack vectors → JSON file at cache/rag/<collection>.json
        ↓
ask(question) → embed question → cosine top-K → chat with chunks as context

Storage is one JSON file per collection. Each chunk is an object with text + metadata; vectors are base64-packed Float32Array — about 3 KB per chunk. ~10k chunks fits comfortably in memory.

Multiple collections

You can have any number of collections in the same app. Each has its own JSON file. They share embedding model and chat model from [AI] config.

$docs    = $ai->rag('docs');
$tickets = $ai->rag('support-tickets');
$logs    = $ai->rag('error-logs');

$docs->ingestDir(__DIR__ . '/help/');
$tickets->ingestText($ticket->body, ['ticket_id' => $ticket->id]);
$logs->ingestText($exception->__toString(), ['ts' => time()]);

API reference

$rag = $ai->rag('name');                    / get/create a named collection

// --- Ingestion ---
$rag->ingestText($text, $metadata = []);    / single chunk
$count = $rag->ingestFile('path');          / returns chunks added
$count = $rag->ingestDir('dir', ['md','txt','php']); / recursive

// --- Querying ---
$hits = $rag->search('query', $k = null);   / [{score, text, metadata}, …]
$answer = $rag->ask('question', $k = null); / top-K → chat call

// --- Maintenance ---
$rag->reset();                              / forget everything (deletes file)
$n = $rag->size();                          / number of chunks

Tuning knobs

In application/module/ai/settings/ai.ini:

[AI]
embed.model        = "nomic-embed-text"   ; or mxbai-embed-large for higher quality
rag.top_k          = 6                    ; chunks injected into the chat call
rag.chunk_target   = 600                  ; tokens per chunk (target)
rag.chunk_min      = 120                  ; smaller chunks merged
rag.chunk_max      = 900                  ; larger paragraphs split on sentences
rag.storage_path   = "/../../application/module/ai/cache/rag/"

When to use it

Help / FAQ chat — ingest your help articles, expose a /ask endpoint.
In-app code search — ingest application/module/, ask “where do we calculate VAT?”
Internal docs assistant — ingest your team’s wiki dump.
Customer-history lookups — ingest tickets, ask “have we seen this error before?”

When NOT to use it

Real-time, write-heavy data — RAG is a snapshot. For live data, write a Tool the agent can call.
Massive corpora (> 100k chunks) — JSON-file storage starts to creak. Move to Qdrant / pgvector / Weaviate; we’ll publish an adapter once we need one ourselves.
Anything where you need exact answers, not probable ones. RAG is probabilistic. Don’t use it as a database query layer.

Common pitfalls

nomic-embed-text not pulled. The first ingestText call will fail with a clear error pointing you at the pull command.
Embedding model mismatch. Don’t mix nomic-embed-text chunks with mxbai-embed-large queries — different vector spaces. If you change embed.model, run $rag->reset() first.
Stale collections. Re-running ingestDir doesn’t dedupe. Use reset() then re-ingest, or maintain a content-hash check yourself.
Tiny chunks. Below ~80 tokens, embeddings get noisy. The default rag.chunk_min = 120 merges small adjacent chunks.

What’s next

Agent plugin → for tools, not retrieval.
Training nibiru-coder → to make the chat half answer in the framework’s voice.