AI / Intelligence layer

Quiet intelligence in every loop.

Owl's Roost is not a feature checklist with AI bolted on — the platform was built around an intelligent layer that watches, coaches, and writes. Every member sees one AI button. Every operator sees a ledger of every call.

The stack

Eight layers, one coherent intelligence.

Best-in-class components, wired together so the seams disappear behind a single AI button for the member and a single observability dashboard for the operator.

LLMs

OpenRouter — Llama 3.3 70B for chat, Llama 3.1 8B for fast routing and eval, Llama 3.2 11B Vision for images and video frames. Swappable per platform via Super Admin.

Embeddings

Cohere text embeddings, with rerank-v3.5 running a second pass over retrieved chunks for relevance scoring before injection into the prompt.

Vector DB

Qdrant Cloud with per-tenant namespaces, per-user namespaces, and content-type filtering — every document indexed on save, retrieved semantically per turn.

Voice

Deepgram speech-to-text for chat voice input, attachment audio, and meeting transcripts (when the leader connects Zoom or Fathom).

Working memory

Valkey (Redis-compatible) holds the most recent turns with TTL-based eviction so the coach replies with millisecond context lookup.

Long-term memory

`AiChatSession.messages` stores the full Postgres history — every tool call, every system prompt revision, every assistant turn — resumable across devices.

Observability

AiTrace records latency, token counts, retrieval scores, faithfulness scores, routing decisions, tool failures, and cost on every single call.

Web context

Firecrawl ingests URLs into the knowledge base on demand — paste a link in chat and the coach reads the page, indexes it, and grounds future answers.

Prompt cascade

Five layers per turn. Every change versioned.

Every coach response is built from a deterministic five-layer cascade audited via `PromptVersion`. Pin a tenant to a specific revision; roll back instantly when something regresses.

Platform
Global guardrails — safety, tone, privacy. Set by Owl’s Roost, immutable per tenant.
Tenant
Leader-specified brand voice, mission, prohibited topics.
Coach
Per-coach system prompt — admin-editable, versioned via PromptVersion.
Member
Profile, level, recent activity, current goal.
Turn
Last N turns + retrieved knowledge chunks, reranked by Cohere v3.5.

The coach lineup

One AI button. Seven specialists.

Members never choose a coach. A fast classifier (Llama 3.1 8B) reads each message and routes to the right specialist based on level, profile, and intent — all logged on `AiTrace.delegationDecision`.

Community Coach

General Q&A — networking, connections, default fallback.

Onboarding Coach

Profiles new members, drafts welcome posts, gates the first week.

Level 1 Coach

Foundations — ideal client, offer clarity, deal flow.

Level 2 Coach

Scaling — positioning, connection systems, speaking.

Level 3 Coach

Mastery — leadership, high-value partnerships.

Offer Doc Coach

Crafting the one-sentence referrable offer.

Special Gift Strategist

Identifying and articulating unique strengths and personal brand.

Tools the coach can call

The coach doesn't just answer — it does the work.

A typed tool catalog the LLM can invoke during chat — write a document, scrape a URL, seed an announcement, draft an intro, configure a coach. Every call permission-checked, every result typed, every invocation logged.

Knowledge & contentCoach setupFlight & curriculumMember acquisitionCommunityConnection nestGamificationIntegrationsTenant configAnalytics & goalsCommunicationsData migration

49+ tools, all gated behind feature flags. Every call writes an AiTrace row.

RAG — retrieval-augmented generation

Grounded in your content. Reranked for relevance.

Knowledge base documents and lesson content are embedded with Cohere on save and indexed into Qdrant. Every chat turn that benefits from grounding triggers a retrieval, then Cohere `rerank-v3.5` runs a second pass to score the top chunks for relevance — dropping low-score noise before injection. Faithfulness is scored after the response and stored on the AiTrace row, so we can detect hallucination drift over time and route around models or prompts that regress on grounding.

Embed on save (Cohere)
Index into per-tenant Qdrant namespace
Retrieve top-K chunks per turn
Rerank with Cohere v3.5
Inject only high-score chunks
Score faithfulness post-generation

Memory & continuity

Three tiers of memory. One coherent thread.

When a session grows past a threshold, an LLM-summarised compaction folds older turns into a summary so the coach maintains continuity without paying for runaway context.

Valkey

Working memory

The most recent turns held in Valkey for instant retrieval — the coach knows what you just said without reaching into Postgres.

Postgres

Episodic memory

Full session history persisted in Postgres on every turn so the coach can resume across devices, summarise older context, and answer "what did we agree last week?" in plain language.

Qdrant per-user

Long-term memory

Knowledge documents, member profile, and past sessions semantically retrievable from the per-user Qdrant namespace — the coach remembers you the way a thoughtful mentor would.

Observability

Every call traced. Every regression detectable.

Every AI call writes an `AiTrace` row with the full input and output (PII-redacted), latency at every step, token counts and cost, retrieval scores from Qdrant and Cohere, faithfulness score, the routing decision (which coach handled this turn, why), and any errors or tool failures. The Super Admin observability dashboard surfaces traces filtered by tenant, by coach, by error rate, and by faithfulness drop. Every turn is searchable. Every regression is detectable.

Built around AI from day one

Replace your back office with a fleet of AI coaches.

Apply to launch your community on Owl's Roost. Bring your own LLM keys, or run on the default Llama 3.3 70B stack from day one.

Watch a demo

Quiet intelligence in every loop.

Eight layers, one coherent intelligence.

LLMs

Embeddings

Vector DB

Voice

Working memory

Long-term memory

Observability

Web context

Five layers per turn. Every change versioned.

Platform

Tenant

Coach

Member

Turn

One AI button. Seven specialists.

Community Coach

Onboarding Coach

Level 1 Coach

Level 2 Coach

Level 3 Coach

Offer Doc Coach

Special Gift Strategist

The coach doesn't just answer — it does the work.

Grounded in your content. Reranked for relevance.

Three tiers of memory. One coherent thread.

Working memory

Episodic memory

Long-term memory

Every call traced. Every regression detectable.

Replace your back office with a fleet of AI coaches.