ELAYGENT supports reviewed knowledge injection into the live voice assistant and a bounded vector retrieval path over approved backend knowledge content. Uploaded files are stored privately, reviewed by operators, distilled into structured content, and then either pushed into the assistant system prompt or indexed for retrieval.
This distinction matters: retrieval is real, but unreviewed raw PDFs are not automatically trained, parsed, or searched end to end.
Private tenant-scoped document storage
Shipped
Tenant knowledge files are stored in the private `kb-files` Supabase bucket. Tenant users access files through a short-lived signed URL route that validates the path starts with their tenant ID.
Limits
Uploaded files are stored for review; storage alone does not mean the AI has learned from the file.
What it does NOT do
No public storage URLs are stored or served.
No customer-managed encryption keys for KB files today.
Admin-reviewed structured assistant knowledge
Shipped
ELAYGENT operators review uploaded content and activate structured `ai_content` records. The activation path can wrap plain notes into the structured KB schema and writes sync state back to `kb_items`.
Limits
Human review is required before content becomes assistant knowledge.
Thin or empty documents are excluded from AI injection and surfaced as needing content.
What it does NOT do
No automatic training or fine-tuning happens when a customer uploads a file.
No unreviewed upload is silently pushed to the live assistant.
System-prompt Q&A injection into the live assistant
Shipped (bounded)
Active KB rows are transformed into Q&A pairs by `buildKbQaPairs()` and sent to backend assistant provisioning as `knowledge_base`. The live assistant then receives the pairs as system-prompt context.
Limits
This works best for short FAQs, policies, hours, pricing ranges, and service facts.
Large PDFs are not chunked or semantically searched; reviewers must distill them into concise Q&A content.
Sync is best-effort. `sync_status` records whether the document is live, needs content, not linked, or failed to sync.
What it does NOT do
This is not vector retrieval.
This is not semantic search over full uploaded documents.
This is not model fine-tuning.
True vector RAG over uploaded docs
Shipped (bounded)
The backend can chunk approved active TenantDocument content, generate OpenAI embeddings, store vectors in pgvector, retrieve tenant/location-scoped snippets, and expose them to the live Vapi assistant through the `query_knowledge` tool.
Limits
This indexes reviewed backend TenantDocument.processingNotes, not raw unreviewed portal uploads.
Operators must run the admin reindex route after approved knowledge changes.
Document-level RAG state is available through the backend admin status API, but a tenant-facing reindex UI is not shipped yet.
The legacy system-prompt Q&A injection path still exists and is separate from retrieval.
If the platform embedding provider is not configured, embeddings and retrieval are unavailable.
What it does NOT do
Do not say ELAYGENT trains on raw documents.
Do not say unreviewed uploads are searchable by the AI.
Do not say full PDFs are parsed automatically end to end.