Memory system
Three memory tiers, five storage layers. Strict role separation is what prevents a long-running agentic system from collapsing under its own weight.
Why three tiers
An agent with only one memory — its context window — cannot hold a conversation beyond a few hours. An agent with a single external memory (a vector store, for example) cannot distinguish what just happened from what we discussed last week.
Nika OS posits three tiers for this reason:
- Working memory — the context window of the current instance. Bounded (~200 kB of tokens). Automatically compacted at 61 % fill.
- Episodic memory — the operational history: sessions, jobs, tournaments, decisions, pod events. Stored in Qdrant + JSONL.
- Semantic memory — general knowledge: ingested documents, patterns,
code snippets, articles, answers to questions. Stored in Qdrant
(
nika_vault).
The five storage layers
| Layer | Store | Content | Good for | Bad for |
|---|---|---|---|---|
| Semantic | Qdrant | Summaries, decisions, docs, patterns, preferences | Semantic recall, similarity | Exact state, locks, ownership |
| Workflow | hierarchy.py + YAML | Entity status, parent/child, deadlines, assignment | Execution source of truth | Semantic search |
| IPC | Redis Streams + Hashes | Entity feed, working memory, signaling, contracts | Real-time inter-pod comms | Long-term history |
| Transactional | Redis | Semaphore, heartbeat, metrics, TTL | Real-time state, concurrency | Long-term history |
| Comms | JSONL bus | Inter-agent messages, review requests, dispatches | Audit trail, coordination | Complex search |
The golden rule: semantic RAG is never the source of truth for execution state. Qdrant recalls what has been. The YAML hierarchy says what is now. Redis IPC carries what is happening at time T.
Mixing these roles produces subtle bugs: a pod that believes a job is done
because Qdrant has a summary, while the YAML still says in_progress, will
take the wrong decisions.
The NIKA_META envelope
Every Qdrant point, every bus message, every memory chunk carries a
standardized NIKA_META envelope. This envelope answers the questions:
who? where? when? what? why? how is it linked to what?
| Group | Fields | Examples |
|---|---|---|
| Identity (WHO) | session_id, tmux, agent_type, trigger | session_id=abc, agent=kernel, trigger=user |
| Hierarchy (WHERE) | project_id, job_id, task_id | PROJ-NIKA-CORE, JOB-HOOKS-V3, TASK-xxx |
| Temporal (WHEN) | timestamp, duration_s | 2026-05-22T10:48:25Z, 120 |
| Classification (WHAT/WHY) | action_type, domain, intent | code, hooks, feature |
| Graph (LINKS) | entity_ids[], files[], tools[], parent_id, produces[] | [JOB-xxx], [on_stop.py], [Edit, Bash] |
Values are constrained by controlled enumerations:
- action_type:
code,research,debug,deploy,config,review,doc,system,comms - domain:
hooks,infra,rag,hierarchy,mcp,ui,finance,browser,agent,memory - intent:
feature,fix,refactor,optimize,explore,maintain,migrate,test - trigger:
user,cron,autonomous,hook,pod,daemon,system
This envelope discipline enables three things:
- Filtering a RAG search by project, domain, or action type.
- Reconstructing a graph of related events (file edited by which pod, stemming from which job).
- Auditing retrospectively why the system made a given decision.
Automatic compaction
Nika OS automatically compacts the context window at 61 % fill (an
arbitrary threshold chosen to preserve maneuvering room). Compaction
triggers the PreCompact hook, which produces a handoff packet:
PreCompact handoff packet
├── Decisions taken (with timestamps)
├── Modified files (with summarized lines/diff)
├── Pending tasks (with minimal context to resume)
├── RAG state (recent queries, results)
└── Link to fired lifecycle hooks
This packet is ingested to Qdrant. The pod restarts with a clean context and can reconstitute the packet through a single RAG search at boot.
WATERFALL RETRIEVAL: the factual hard gate
Before answering a factual question about a client, a project, a past
decision, or a remembered entity, the system must first run a
wide→narrow retrieval cascade: semantic RAG search (nika_rag_search or
qdrant-find) → naming-key/metadata filter → dependency graph (graphify) →
exact grep + structured state (Postgres) → targeted hits.
If the cascade returns zero relevant hit, the correct answer is: “no memory in RAG for {X}”. Not a fabrication. Not an approximation. Not a “general” answer.
This rule is called WATERFALL RETRIEVAL (formerly PALACE PROTOCOL), after the cascade that starts from broad meaning and converges to exact hits. The underlying principle is unchanged: retrieval is the only source of truth about the past — and the loop closes through well-named deliverables, which are themselves immediately findable by every stage of the cascade.
Auto-memory
Alongside RAG, the kernel agent leverages the CLI agent’s native auto-memory
when it is available: a persistent directory (resolved at install,
$NIKA_CONFIG/memory/) that contains:
- an index file
MEMORY.md(always loaded, max 200 lines); - topic-specific files (one-line description in frontmatter).
Entries are saved when the user provides non-trivial feedback, shares a
current project, or corrects an approach. Triage is by four types:
user, feedback, project, reference. The system does not save
what can be derived from current code or git log.