Meta-curator and evolution

How Nika OS evolves its harness — prompts, skills, hooks, tools and success measures — per task type. Continuous supervision by a meta-curator, consolidation by matches during the dream.

Two separate roles: kernel and meta-curator

The kernel and the meta-curator are both part of the system, but they run as two separate sessions (own contexts, distinct spaces).

  • The kernel is user-facing: it receives the request, orchestrates (spawns pods + routes jobs), and owns the memory substrate (RAG ingest + retrieve).
  • The meta-curator operates at the META level = ANALYSIS, not spawn-orchestration. It is an LLM-as-judge inside the process and a guardian of meaning: it watches coherence and detects drift. It does not orchestrate spawn or routing — that stays the kernel’s job.

The separation is deliberate: analyzing and executing are two different jobs. Mixing them in a single loop dilutes both.

The harness that evolves — every primitive, not just prompts

A common mistake is to think only prompt wording evolves. In Nika OS, the entire harness of a task is subject to evolution, per task type:

PrimitiveWhat evolves
PromptsRephrasings, added examples, removed redundancies
SkillsThe markdown body, the triggers, composition across skills
HooksWhich lifecycle events are wired, their logic, their thresholds
Tools / MCPWhich tools are loaded for a task class, their loadout
Success measuresThe very definition of success: SQP criteria, judgment rubrics, controller thresholds

The key point: we don’t only optimize how we ask, we also evolve how we measure whether it worked. A harness that evolves without revising its own success measures ends up over-optimizing an obsolete metric. Each task type (research, writing, code generation, data analysis…) has its own harness, which evolves independently.

The kernel itself never mutates. Only the harness evolves. This boundary protects the security invariants and the output contracts.

Continuous supervision — real time

The meta-curator observes the system continuously, while pods are working:

  • Streams everything happening across all pods (observability).
  • Sorts prompts and the primitives used: were the right primitives activated? are there duplicates? is quality there?
  • Sends feedback to the pod’s launcher — the kernel, or (depending on recursion depth) the parent pod that launched it — so it can adjust its declarations and configs.
  • Pilots learning: among its tools it holds the launch of fine-tuning and the management of GPU/CPU compute for embedding — it analyzes, decides, then triggers a retrain of the retrieval model and manages the compute resources.

It does not orchestrate spawn: it analyzes and advises. The kernel acts on that advice.

The dream — consolidation by matches

When activity drops (a nightly cycle), the kernel and the meta-curator work together on a consolidation by matches (tournaments, consistent with the ELO system):

  1. Material — the real episodes of the period (conversation turns, bus, transcripts) are replayed.
  2. Challengers — the meta-curator generates variants (prompts, primitives, strategies) on the task types observed.
  3. Match — a duel between the champion (the primitive in place) and the challenger, replayed by the kernel on a sample of episodes.
  4. Arbitration — the meta-curator judges (SQP criteria + the episode’s Definition of Done), looking at effect size, not just the p-value.
  5. ELO + promotion — the winner gains ELO; it dethrones the champion only on measured improvement against a held-out baseline — never on a “feels better”.
  6. Lessons learned — keep the lesson (why it wins) and drop the noise (the anti-Funes principle: remembering everything means being unable to think). The promoted patch joins the primitives; the lesson joins the RAG.

The distinction in one sentence: continuous supervision gives immediate, per-pod feedback; the dream consolidates in batches and through tournaments. The two are complementary.

The online controller

Not every parameter is best optimized by tournament. For those with a reward measurable at decision time (model choice, pod concurrency, retry policy, number of RAG chunks), Nika OS uses an online controller.

The controller decides how much to intervene on a parameter, from three ingredients:

  1. an estimate of whether an observed change is noise or a real drift;
  2. the cost of the deviation from target;
  3. an allocation that grows with confidence — intervene little when the signal is uncertain, firmly when it is clear.

Three zones emerge:

  • Noise — do nothing.
  • Doubt — soft correction, wait for confirmation.
  • Drift — firm and immediate correction.

This behavior makes the system antifragile: instead of oscillating at every noise, it waits for the statistical information it needs before acting. We do not apply the controller where the reward is subjective (the quality of a mail draft, say): that case belongs to the LLM-as-judge tournament.

The three evolution tracks

Nika OS distinguishes three evolution mechanisms, chosen by what can be measured:

TrackTargetsCadenceEvaluation
Tournament (GEPA)Skills, prompts, doctrinesContinuous, via LLM-judge tournamentPareto multi-objective
Ablation + DOEHooks, MCP configs, routing logicDiscrete, batchedNon-parametric test + effect size + bootstrap CI
Online controllerOnline params (retry, model, concurrency, RAG k)Per-decisionReal-time reward signal

The track is chosen by the nature of the signal: subjective reward → tournament; controlled comparison → DOE; reward measurable online → controller.

Why this architecture

Separating analysis (meta-curator) from execution (kernel) lets each be optimized for its job, without an analysis loop slowing down user-facing work. Promotion only on measured proof avoids the risk of shipping a change that “feels better” without being better. And the anti-Funes principle — keep the lesson, drop the noise — prevents the system from collapsing under the weight of its own history.

See also

  • Observability and controllers — how the system observes its pods without saturating its context, and the loss score that feeds the controller.
  • Doctrines — antifragility and the algos as tools doctrine explain why the controller is an invoked tool, not an LLM.
  • References — external work that inspired the architecture.