Quiet intelligence in every loop.
Owl's Roost is not a feature checklist with AI bolted on — the platform was built around an intelligent layer that watches, coaches, and writes. Every member sees one AI button. Every operator sees a ledger of every call.
Eight layers, one coherent intelligence.
Best-in-class components, wired together so the seams disappear behind a single AI button for the member and a single observability dashboard for the operator.
LLMs
OpenRouter — Llama 3.3 70B for chat, Llama 3.1 8B for fast routing and eval, Llama 3.2 11B Vision for images and video frames. Swappable per platform via Super Admin.
Embeddings
Cohere text embeddings, with rerank-v3.5 running a second pass over retrieved chunks for relevance scoring before injection into the prompt.
Vector DB
Qdrant Cloud with per-tenant namespaces, per-user namespaces, and content-type filtering — every document indexed on save, retrieved semantically per turn.
Voice
Deepgram speech-to-text for chat voice input, attachment audio, and meeting transcripts (when the leader connects Zoom or Fathom).
Working memory
Valkey (Redis-compatible) holds the most recent turns with TTL-based eviction so the coach replies with millisecond context lookup.
Long-term memory
`AiChatSession.messages` stores the full Postgres history — every tool call, every system prompt revision, every assistant turn — resumable across devices.
Observability
AiTrace records latency, token counts, retrieval scores, faithfulness scores, routing decisions, tool failures, and cost on every single call.
Web context
Firecrawl ingests URLs into the knowledge base on demand — paste a link in chat and the coach reads the page, indexes it, and grounds future answers.
Five layers per turn. Every change versioned.
Every coach response is built from a deterministic five-layer cascade audited via `PromptVersion`. Pin a tenant to a specific revision; roll back instantly when something regresses.
Platform
Global guardrails — safety, tone, privacy. Set by Owl’s Roost, immutable per tenant.
Tenant
Leader-specified brand voice, mission, prohibited topics.
Coach
Per-coach system prompt — admin-editable, versioned via PromptVersion.
Member
Profile, level, recent activity, current goal.
Turn
Last N turns + retrieved knowledge chunks, reranked by Cohere v3.5.
One AI button. Seven specialists.
Members never choose a coach. A fast classifier (Llama 3.1 8B) reads each message and routes to the right specialist based on level, profile, and intent — all logged on `AiTrace.delegationDecision`.
Community Coach
General Q&A — networking, connections, default fallback.
Onboarding Coach
Profiles new members, drafts welcome posts, gates the first week.
Level 1 Coach
Foundations — ideal client, offer clarity, deal flow.
Level 2 Coach
Scaling — positioning, connection systems, speaking.
Level 3 Coach
Mastery — leadership, high-value partnerships.
Offer Doc Coach
Crafting the one-sentence referrable offer.
Special Gift Strategist
Identifying and articulating unique strengths and personal brand.
The coach doesn't just answer — it does the work.
A typed tool catalog the LLM can invoke during chat — write a document, scrape a URL, seed an announcement, draft an intro, configure a coach. Every call permission-checked, every result typed, every invocation logged.
49+ tools, all gated behind feature flags. Every call writes an AiTrace row.
Grounded in your content. Reranked for relevance.
Knowledge base documents and lesson content are embedded with Cohere on save and indexed into Qdrant. Every chat turn that benefits from grounding triggers a retrieval, then Cohere `rerank-v3.5` runs a second pass to score the top chunks for relevance — dropping low-score noise before injection. Faithfulness is scored after the response and stored on the AiTrace row, so we can detect hallucination drift over time and route around models or prompts that regress on grounding.
- Embed on save (Cohere)
- Index into per-tenant Qdrant namespace
- Retrieve top-K chunks per turn
- Rerank with Cohere v3.5
- Inject only high-score chunks
- Score faithfulness post-generation
Three tiers of memory. One coherent thread.
When a session grows past a threshold, an LLM-summarised compaction folds older turns into a summary so the coach maintains continuity without paying for runaway context.
Working memory
The most recent turns held in Valkey for instant retrieval — the coach knows what you just said without reaching into Postgres.
Episodic memory
Full session history persisted in Postgres on every turn so the coach can resume across devices, summarise older context, and answer "what did we agree last week?" in plain language.
Long-term memory
Knowledge documents, member profile, and past sessions semantically retrievable from the per-user Qdrant namespace — the coach remembers you the way a thoughtful mentor would.
Every call traced. Every regression detectable.
Every AI call writes an `AiTrace` row with the full input and output (PII-redacted), latency at every step, token counts and cost, retrieval scores from Qdrant and Cohere, faithfulness score, the routing decision (which coach handled this turn, why), and any errors or tool failures. The Super Admin observability dashboard surfaces traces filtered by tenant, by coach, by error rate, and by faithfulness drop. Every turn is searchable. Every regression is detectable.
Replace your back office with a fleet of AI coaches.
Apply to launch your community on Owl's Roost. Bring your own LLM keys, or run on the default Llama 3.3 70B stack from day one.