Multi-kernel federation

Beyond a single tenant, one kernel is no longer enough. Federation exposes N Nika OS kernels per /goal, routed via a semantic RAG triage and federated through Redis pub/sub.

Why a single kernel is not enough

A well-tuned Nika OS kernel can absorb sustained usage for several months. But once past the single-tenant threshold — several distinct business perimeters, several teams, several sets of doctrines — the intrinsic limits of a single instance appear:

  1. Context window saturation — even with handover at 41%, a kernel shared between R&D, customer operations, and public procurement monitoring accumulates too much heterogeneous signal. RAG compression cannot keep up.
  2. Doctrine conflicts — two perimeters can require contradictory invariants (e.g. tolerance for external writes in R&D vs. systematic validation on a customer engagement). A single kernel policy cannot honor both at once.
  3. Credential isolation — a BCUB3-IT kernel and a customer Pulsa kernel must not be able to read each other’s credentials, even accidentally.
  4. Decoupled mutation cycle — GEPA mutations useful to one perimeter are not necessarily useful to another. A unified tournament dilutes signals.

The architectural answer is to spawn N kernels, each with its own /goal, its own primitives, and its own isolated memory — then federate them through a shared layer.

The pattern: one kernel per /goal

Each kernel is an isolated Nika OS instance:

FieldContent proper to the kernel
/goalDistinct strategic mission (short string)
Pod spacetmux nika-os-{kernel_id} or dedicated K3s namespace
Memory MDSeparate ~/.claude/projects/{kernel_id}/ directory
settings.jsonSkills, hooks and permissions tailored
Local autonomyInternal decisions without consulting other kernels

A few typical /goals:

KernelPerimeter
kernel-alpha-bcub3-itBCUB3 IT operator (daily reference)
kernel-beta-bcub3-labBCUB3 Lab R&D — KTW, sWELU patents, benchmarks
kernel-gamma-client-missionCustomer engagements: mails, reports, deliverables
kernel-delta-iot-edgeEmbedded controller + sensor data
kernel-epsilon-tender-watchPublic procurement watch and sectoral benchmarks

Each kernel has its own tmux, its context, its tailored skills. None has direct access to another’s context; they communicate only via the federation layer.

Semantic RAG triage

A user prompt arriving at the federation must be routed to the right kernel. That is the role of the triage layer:

  1. The prompt is embedded (bge-small or equivalent model, 384 dimensions).
  2. A cosine search is run on the nika_federation_goals collection (1 chunk = 1 kernel /goal + its semantic tags).
  3. The top-1 is selected if its score exceeds a confidence threshold.
  4. Otherwise, the prompt is routed to the default generalist kernel, which can request clarification or create a new kernel via the kernel_spawn primitive.

Triage is fast (< 100 ms) and deterministic on non-ambiguous prompts. For hybrid prompts (e.g. “generate an ISO technical drawing for customer X”), the triage can produce a split into several sub-prompts addressed to several kernels.

Federation via Redis pub/sub

Kernels exchange weak signals through a shared Redis bus:

ChannelDirectionUsage
nika:federation:directivesany kernelSignal kernel A → kernel B: here is an event that concerns you
nika:federation:work_stealingany kernelAnnounces that a kernel has free capacity and can take pending work
nika:federation:healthany kernelHeartbeat + aggregated metrics (TTFL, error_rate, KTW Y)
nika:federation:goalstriageUpdate of the registry of active /goals

No kernel shares its context window directly with another. Pub/sub carries only signals (entity references, task IDs, scores, alerts). To transfer content, we go through the shared RAG.

Shared RAG and triage layer

The Qdrant nika_vault store is shared between all kernels, with a kernel_id field filtered by default in every query. A kernel does not read another’s memory by default. But the triage maintains two complementary collections:

  • nika_federation_summaries — a compact summary per kernel and per day (1 chunk / kernel / 24h). Lets a kernel understand what the others are doing without reading their transcripts.
  • nika_federation_goals — the knowledge base that drives the triage. Each entry describes a kernel by its /goal + its domains + its canonical prompt examples.

Continuous triage deduplicates (cosine sim > 0.95 = drop), summarizes (frequent chunks → super-chunk), and hierarchizes (strategy → project → job → task → atomic) to keep the RAG actionable.

Work-stealing between kernels

When a kernel A is silent (no current prompt, empty queue), it can subscribe to nika:federation:work_stealing and offer its resources:

  1. Kernel B publishes a task_available signal with a priority score.
  2. Kernel A (free capacity) replies task_claim with its aptitude score (computed on the similarity of its /goal with the task).
  3. If A is the only candidate and its similarity exceeds a threshold, it takes the task.
  4. Otherwise, B keeps the task in its own queue.

This pattern absorbs load peaks without having to pre-allocate capacity on every kernel.

Architecture diagram

flowchart TB
    U["User prompt"] --> T["RAG triage<br/>cosine on /goal"]
    T -->|"BCUB3-IT match"| KA["Kernel Alpha<br/>BCUB3-IT"]
    T -->|"BCUB3-LAB match"| KB["Kernel Beta<br/>BCUB3-LAB"]
    T -->|"client match"| KC["Kernel Gamma<br/>Customer mission"]

    KA <-->|"directives + signals"| BUS[("Redis pub/sub<br/>nika:federation:*")]
    KB <-->|"directives + signals"| BUS
    KC <-->|"directives + signals"| BUS

    KA -->|"scoped read/write"| RAG[("Qdrant nika_vault<br/>+ federation_summaries")]
    KB -->|"scoped read/write"| RAG
    KC -->|"scoped read/write"| RAG

    T --> RAG

    classDef prompt fill:#F5F1E8,color:#2C3E42,stroke:#7DB5A5,stroke-width:2px;
    classDef triage fill:#E99971,color:#FDFBF8,stroke:#C97A55,stroke-width:2px;
    classDef kernel fill:#7DB5A5,color:#FDFBF8,stroke:#5E9384,stroke-width:2px;
    classDef bus fill:#2C3E42,color:#F5F1E8,stroke:#1A262A;
    classDef store fill:#F5F1E8,color:#2C3E42,stroke:#A86640;

    class U prompt;
    class T triage;
    class KA,KB,KC kernel;
    class BUS bus;
    class RAG store;

Solid arrows carry content (prompts, embeddings, vectors). Dotted arrows carry weak signals (alerts, scores, references).

Auto-handover via session_registry

When a kernel reaches its context threshold (typically 41% by default), it triggers an automatic handover to a successor kernel:

  1. The current kernel serializes its essential state into nika_federation_summaries (summary) + session_registry (table of active sessions ↔ entities).
  2. A successor kernel is spawned with an identical /goal and the parent entity injected at boot.
  3. The new kernel’s session_id is linked to the root entity_id via session_registry, which lets child pods find their lineage without reading transcripts.

This mechanism is described in more detail in the session ↔ hierarchy doctrine — the registry is the source of truth.

Conditions to spawn a new kernel

Four algorithmic triggers can cause the creation of an additional kernel:

TriggerMeasurementDefault threshold
Context saturationAverage % context used over 24h> 70%
Topic diversityNumber of disjoint clusters in the kernel’s RAG> 5
Doctrine conflictDetection of two mutually contradictory active doctrinesbinary
Explicit requestUser command /kernel new <goal>

None of these triggers spawns a kernel without explicit human validation — the cost (RAM, embeddings, mutation tournaments) is high and deserves arbitration.

Lifecycle: five verbs

As for pods, a kernel follows five verbs:

spawn       → kernel_spawn(goal, scope, resource_limits)
invoke      → kernel_invoke(kernel_id, prompt)
readjust    → kernel_readjust(kernel_id, new_directive)
observe     → kernel_observe(kernel_id, since, query?)
kill        → kernel_kill(kernel_id, reason)

readjust lets us inject a directive mid-session without killing or respawning the kernel — the equivalent of a patch on the context. The verb is useful when the business scope drifts slightly without changing the /goal.

When to orchestrate several kernels in parallel

Three canonical use cases:

  • Load burst — massive ingestion (hundreds of PDFs, videos, data sets). Spawn of a temporary worker kernel during the burst, then decommission.
  • Doctrine diversification — an R&D kernel can tolerate aggressive exploration, a customer kernel requires systematic validation. Separating them lets us optimize each for its business.
  • Data region — a kernel hosted in France for sensitive customer data, another in a neutral zone for public watch. The triage routes according to the nature of the prompt.

Cost and bottlenecks

Each kernel has a fixed cost (RAM ~512 MB – 2 GB depending on loaded skills, CPU during embedding bursts) and a variable cost (API or GPU tokens per decision). Observed bottlenecks on a shared VPS:

  • RAM — primary limit. On a 29 GB VPS, 14 to 58 parallel kernels depending on the per-kernel envelope.
  • CPU — mainly on Qdrant bursts and heavy Python hooks.
  • Network — provider rate limits (Anthropic, OpenRouter, Mistral).
  • Disk — Qdrant growth. The 24h TTL retention on nika:pod_streams mitigates.

K3s pod or namespace isolation lets us enforce hard limits (Linux cgroups) and avoid one kernel saturating the whole machine.

See also