Memory system
Three memory tiers, five storage layers. Strict role separation is what prevents a long-running agentic system from collapsing under its own weight.
Why three tiers
An agent with only one memory — its context window — cannot hold a conversation beyond a few hours. An agent with a single external memory (a vector store, for example) cannot distinguish what just happened from what we discussed last week.
Nika OS posits three tiers for this reason:
- Working memory — the context window of the current instance. Bounded (~200 kB of tokens). Automatically compacted at 61 % fill.
- Episodic memory — the operational history: sessions, jobs, tournaments, decisions, pod events. Stored in Qdrant + JSONL.
- Semantic memory — general knowledge: ingested documents, patterns,
code snippets, articles, answers to customer questions. Stored in Qdrant
(
nika_vault, 413,000+ points).
The five storage layers
| Layer | Store | Content | Good for | Bad for |
|---|---|---|---|---|
| Semantic | Qdrant | Summaries, decisions, docs, patterns, preferences | Semantic recall, similarity | Exact state, locks, ownership |
| Workflow | hierarchy.py + YAML | Entity status, parent/child, deadlines, assignment | Execution source of truth | Semantic search |
| IPC | Redis Streams + Hashes | Entity feed, working memory, signaling, contracts | Real-time inter-pod comms | Long-term history |
| Transactional | Redis | Semaphore, heartbeat, metrics, TTL | Real-time state, concurrency | Long-term history |
| Comms | JSONL bus | Inter-agent messages, review requests, dispatches | Audit trail, coordination | Complex search |
The golden rule: semantic RAG is never the source of truth for execution state. Qdrant recalls what has been. The YAML hierarchy says what is now. Redis IPC carries what is happening at time T.
Mixing these roles produces subtle bugs: a pod that believes a job is done
because Qdrant has a summary, while the YAML still says in_progress, will
take the wrong decisions.
The NIKA_META envelope
Every Qdrant point, every bus message, every memory chunk carries a
standardized NIKA_META envelope. This envelope answers the questions:
who? where? when? what? why? how is it linked to what?
| Group | Fields | Examples |
|---|---|---|
| Identity (WHO) | session_id, tmux, agent_type, trigger | session_id=abc, agent=alpha, trigger=user |
| Hierarchy (WHERE) | project_id, job_id, task_id | PROJ-NIKA-CORE, JOB-HOOKS-V3, TASK-xxx |
| Temporal (WHEN) | timestamp, duration_s | 2026-05-22T10:48:25Z, 120 |
| Classification (WHAT/WHY) | action_type, domain, intent | code, hooks, feature |
| Graph (LINKS) | entity_ids[], files[], tools[], parent_id, produces[] | [JOB-xxx], [on_stop.py], [Edit, Bash] |
Values are constrained by controlled enumerations:
- action_type:
code,research,debug,deploy,config,review,doc,system,comms - domain:
hooks,infra,rag,hierarchy,mcp,ui,finance,browser,agent,memory - intent:
feature,fix,refactor,optimize,explore,maintain,migrate,test - trigger:
user,cron,autonomous,hook,pod,daemon,system
This envelope discipline enables three things:
- Filtering a RAG search by project, domain, or action type.
- Reconstructing a graph of related events (file edited by which pod, stemming from which job).
- Auditing retrospectively why the system made a given decision.
Automatic compaction
Nika OS automatically compacts the context window at 61 % fill (an
arbitrary threshold chosen to preserve maneuvering room). Compaction
triggers the PreCompact hook, which produces a handoff packet:
PreCompact handoff packet
├── Decisions taken (with timestamps)
├── Modified files (with summarized lines/diff)
├── Pending tasks (with minimal context to resume)
├── RAG state (recent queries, results)
└── Link to fired lifecycle hooks
This packet is ingested to Qdrant. The pod restarts with a clean context and can reconstitute the packet through a single RAG search at boot.
PALACE PROTOCOL: the factual hard gate
Before answering a factual question about a client, a project, a past
decision, or a remembered entity, the system must first query RAG
(nika_rag_search or qdrant-find).
If the retrieval returns zero relevant hit, the correct answer is: “no memory in RAG for {X}”. Not a fabrication. Not an approximation. Not a “general” answer.
This rule is called PALACE PROTOCOL, after the mempalace project (the highest-scoring memory system on public benchmarks, free under MIT). The underlying principle: RAG is the only source of truth about the past.
Auto-memory in Claude Code
Alongside RAG, the Alpha pod leverages Claude Code’s native auto-memory:
a persistent directory ~/.claude/projects/-home-nika-vault/memory/ that
contains:
- an index file
MEMORY.md(always loaded, max 200 lines); - topic-specific files (one-line description in frontmatter).
Entries are saved when the user provides non-trivial feedback, shares a
current project, or corrects an approach. Triage is by four types:
user, feedback, project, reference. The system does not save
what can be derived from current code or git log.