RAG plugin
Ingest text, embed it, retrieve top-K, and answer grounded questions — all in one PHP class.
The RAG plugin is the AI module’s killer feature for product builders. It turns any pile of text — your help docs, your error logs, your Stripe invoices, your customer-support tickets — into a queryable knowledge base in roughly four lines of PHP.
Three minutes, end-to-end
Section titled “Three minutes, end-to-end”use Nibiru\Module\Ai\Ai;
$ai = new Ai();$rag = $ai->rag('product-help'); / a named collection
$rag->ingestDir(__DIR__ . '/help/'); / walks .md/.txt/.php under help/$rag->ingestText('FAQ entry…', ['source' => 'faq-12']);
echo $rag->ask('How do I cancel my subscription?');// → grounded answer, citing chunks like [1] [2] [3]That’s it. No vector DB. No SDK. No Python sidecar.
How it works
Section titled “How it works”ingestText / ingestFile / ingestDir ↓ chunk → embed (Ollama nomic-embed-text) ↓ pack vectors → JSON file at cache/rag/<collection>.json ↓ask(question) → embed question → cosine top-K → chat with chunks as contextStorage is one JSON file per collection. Each chunk is an object with text + metadata; vectors are base64-packed Float32Array — about 3 KB per chunk. ~10k chunks fits comfortably in memory.
Multiple collections
Section titled “Multiple collections”You can have any number of collections in the same app. Each has its own JSON file. They share embedding model and chat model from [AI] config.
$docs = $ai->rag('docs');$tickets = $ai->rag('support-tickets');$logs = $ai->rag('error-logs');
$docs->ingestDir(__DIR__ . '/help/');$tickets->ingestText($ticket->body, ['ticket_id' => $ticket->id]);$logs->ingestText($exception->__toString(), ['ts' => time()]);API reference
Section titled “API reference”$rag = $ai->rag('name'); / get/create a named collection
// --- Ingestion ---$rag->ingestText($text, $metadata = []); / single chunk$count = $rag->ingestFile('path'); / returns chunks added$count = $rag->ingestDir('dir', ['md','txt','php']); / recursive
// --- Querying ---$hits = $rag->search('query', $k = null); / [{score, text, metadata}, …]$answer = $rag->ask('question', $k = null); / top-K → chat call
// --- Maintenance ---$rag->reset(); / forget everything (deletes file)$n = $rag->size(); / number of chunksTuning knobs
Section titled “Tuning knobs”In application/module/ai/settings/ai.ini:
[AI]embed.model = "nomic-embed-text" ; or mxbai-embed-large for higher qualityrag.top_k = 6 ; chunks injected into the chat callrag.chunk_target = 600 ; tokens per chunk (target)rag.chunk_min = 120 ; smaller chunks mergedrag.chunk_max = 900 ; larger paragraphs split on sentencesrag.storage_path = "/../../application/module/ai/cache/rag/"When to use it
Section titled “When to use it”- Help / FAQ chat — ingest your help articles, expose a
/askendpoint. - In-app code search — ingest
application/module/, ask “where do we calculate VAT?” - Internal docs assistant — ingest your team’s wiki dump.
- Customer-history lookups — ingest tickets, ask “have we seen this error before?”
When NOT to use it
Section titled “When NOT to use it”- Real-time, write-heavy data — RAG is a snapshot. For live data, write a Tool the agent can call.
- Massive corpora (> 100k chunks) — JSON-file storage starts to creak. Move to Qdrant / pgvector / Weaviate; we’ll publish an adapter once we need one ourselves.
- Anything where you need exact answers, not probable ones. RAG is probabilistic. Don’t use it as a database query layer.
Common pitfalls
Section titled “Common pitfalls”nomic-embed-textnot pulled. The firstingestTextcall will fail with a clear error pointing you at the pull command.- Embedding model mismatch. Don’t mix
nomic-embed-textchunks withmxbai-embed-largequeries — different vector spaces. If you changeembed.model, run$rag->reset()first. - Stale collections. Re-running ingestDir doesn’t dedupe. Use
reset()then re-ingest, or maintain a content-hash check yourself. - Tiny chunks. Below ~80 tokens, embeddings get noisy. The default
rag.chunk_min = 120merges small adjacent chunks.
What’s next
Section titled “What’s next”- Agent plugin → for tools, not retrieval.
- Training nibiru-coder → to make the chat half answer in the framework’s voice.