Hypernym × Forge · Round 6 · 2026 · 05 · 07

The research auto-loop is built for a 1-model-per-role world. The panel we actually run is multi-model, multi-round, mode-aware.

Forge's research FSM advances STRATEGY_REVIEW on Grok + 1 cross-model (count:1). Its VERDICT is single-model. Its inner loop dispatches one model per role per call across three fixed pairs. There is no Pivot/Grind awareness, no convergence-audit gate, no parallel categorical review, no outlier preservation. The convergence-audit code already exists in forge-core (S44l) — it is wired only into the sprint FSM, not the research FSM. This page is the upgrade map.

3 highest-leverage gaps 4 new guard types 3 new FSM states 4 new dispatch primitives 3 outliers preserved

The framing

Asked of all four models in parallel: where are the concrete gaps between how the cross-model panel has actually been working in Rounds 1–5 (Pivot ↔ Grind, parallel categorical review, convergence-audit, 5-reviewer panel) and what the research auto-loop FSM enforces today? Pivot mode — diverge, surface outliers.

Convergent answer · Grok + Gemini + Gemma

Yes. The research-loop side needs significant upgrades — and they're tractable.

All three voices identified the same top three gaps and proposed compatible primitives. The upgrade is not a rewrite; it is wiring existing patterns (panel dispatch · Pivot/Grind mode · convergence-audit · outlier preservation) into the research FSM and the inner loop. Codex's silence is the operator's correction: the convergence-audit machinery already exists in forge-core — built in S44l — and it just needs to be wired into research.yaml and swarma.ts.

FSM — today vs proposed

The current research FSM is a linear chain with single-model gates. The proposed FSM is a mode-aware graph with panel quorum gates, a self-looping convergence-audit state, and outlier preservation as a first-class artifact requirement.

Research FSM · today (red) vs proposed (green) Today · weak gates Proposed · panel-aware New states

flowchart TB
  subgraph TODAY ["TODAY — research.yaml (linear, weak gates)"]
    direction TB
    T_draft["STRATEGY_DRAFT"]
    T_review["STRATEGY_REVIEW
guard: grok + count:1"]
    T_run["EXPERIMENT_RUNNING"]
    T_score["EXPERIMENT_SCORING"]
    T_verdict["VERDICT
guard: grok-only"]
    T_update["STRATEGY_UPDATE"]
    T_draft --> T_review --> T_run --> T_score --> T_verdict --> T_update --> T_review
  end

  subgraph PROPOSED ["PROPOSED — panel-aware, mode-routed, convergence-gated"]
    direction TB
    P_draft["STRATEGY_DRAFT"]
    P_router{{"mode router
pivot · grind"}}
    P_pivot["PIVOT_EXPLORATION
guard: panel_quorum:3 + outlier_preserved"]
    P_grind["GRIND_EXECUTION"]
    P_run["EXPERIMENT_RUNNING
panel.execute · categorical"]
    P_score["EXPERIMENT_SCORING"]
    P_audit["CONVERGENCE_AUDIT
guard: panel_quorum:4 + categorical_review_complete
+ convergence_attested over 2 rounds"]
    P_verdict["VERDICT
guard: panel_quorum:5"]
    P_update["STRATEGY_UPDATE"]

    P_draft --> P_router
    P_router -->|pivot| P_pivot
    P_router -->|grind| P_grind
    P_pivot -->|accept_strategy| P_grind
    P_pivot -.->|outliers archived| P_artifacts[("outlier artifacts
.forge/artifacts/outliers/")]
    P_grind --> P_run --> P_score --> P_audit
    P_audit -->|revise · new findings| P_run
    P_audit -->|converged| P_verdict
    P_verdict --> P_update --> P_draft
  end

  classDef weak fill:#3b3036,stroke:#bf616a,color:#eceff4,font-weight:600;
  classDef strong fill:#2e3a30,stroke:#a3be8c,color:#eceff4,font-weight:600;
  classDef new fill:#3a352a,stroke:#ebcb8b,color:#eceff4,font-weight:600;
  classDef artifact fill:#352e3a,stroke:#b48ead,color:#eceff4;
  class T_draft,T_review,T_run,T_score,T_verdict,T_update weak;
  class P_draft,P_grind,P_run,P_score,P_verdict,P_update strong;
  class P_pivot,P_audit new;
  class P_router new;
  class P_artifacts artifact;

Dispatch shape — today vs proposed

The shape of model invocation today vs what the panel pattern actually requires. The transformation is not adding more models — the manifest already lists them — it is moving from per-call single-model dispatch to per-call N-model panel dispatch with mode-aware aggregation.

Today · loop.ts + swarma.ts

Single-model per role · 3 fixed pairs · no convergence-audit

// research/harness/src/loop.ts
const role = manifest.roles.review; // 'qwen3:8b'
const result = await dispatch(role, prompt);
ledger.record({ model: role, score, ... });

// One model. One round. One verdict.
// No categorical steering. No mode awareness.
// No outlier preservation. No N+1 audit.

// swarma.ts rotation:
[
  { draft: 'qwen3-coder:30b', executor: 'codex' },
  { draft: 'gemma4:26b',      executor: 'gemini' },
  { draft: 'qwen3:8b',         executor: 'codex' },
]
// Pairs, not panels. Round-robin, not parallel.

Proposed · panel.execute · mode + categorical

N-model parallel · mode-aware · convergence-audited

// research/harness/src/loop.ts
const config = manifest.roles.review; // PanelConfig
const responses = await panel.execute(config, prompt);

if (config.mode === 'pivot') {
  await artifact.preserveOutliers(responses);
} else {
  const findings = steerByCategory(responses, axes);
  context.audit_history.push(findings);
  if (!detectConvergence(history, 2)) {
    await fsm.transition('REVISE');
  }
}

// manifest:
roles: {
  review: {
    models: ['grok', 'codex', 'gemini',
             'qwen3:32b', 'gemma4:26b'],
    policy: 'converge', min_quorum: 4,
    categories: ['security','concurrency',
                  'idempotence','persistence'],
  }
}

The three highest-leverage gaps

All three model voices independently arrived at the same top three. The order is the same. The component naming overlaps. This is the strongest convergence signal of the round.

Rank	Gap	File / component	What's missing	Cost of leaving as-is
G1	Single-model per-role dispatch	loop.ts swarma.ts	Concurrent N-model fan-out/fan-in for role execution. `panel.execute(role)` primitive. Categorical steering. Aggregation across responses.	Systemic blindness. Loop trapped in the local optimum of one model's reasoning path. Missing entire classes of bugs and architectural alternatives that another model would surface immediately. Sub-optimal final product by design.
G2	Mode-blind FSM (no Pivot vs Grind)	research.yaml swarma.ts	Conditional state transitions based on mode. `PIVOT_EXPLORATION` and `GRIND_EXECUTION` as distinct paths. Pivot maximizes alternatives + preserves outliers. Grind converges on a verified implementation.	Wrong tool for every job. Loop converges when it should diverge (kills innovative ideas) and diverges when it should converge (creates churn during implementation). Maximally inefficient — neither mode is served.
G3	Absent convergence-audit gate	research.yaml loop.ts	Self-transitioning `CONVERGENCE_AUDIT` state. `convergence_attested` guard: passes only if N consecutive panel reviews produce no new high-severity findings. Logic to re-run scoring until guard is met.	Shipping latent critical defects. The FSM advances after a single review pass, ignoring the observed reality that initial fixes introduce new subtle vulnerabilities. Security and reliability failure by design. Codex iterates 2–4 rounds and finds bypass classes adjacent to each fix — this is not optional.

Primitives the panel converged on

New guards, states, and dispatch primitives. Every primitive listed is named by at least two of the three voices. The implementation cost is small — most are 50–200 LOC in TypeScript.

Guard · panel_quorum

panel_quorum(models, min)

Requires N models from a specified panel to successfully attest. Replaces today's count:1 cross-model rule. Pivot uses 3-of-5; Grind audit uses 4-of-5; final Verdict requires 5-of-5.

named by · Grok · Gemini · Gemma

Guard · convergence_attested

convergence_attested(history, rounds)

Passes only when the set of critical findings has not changed across N consecutive rounds. Uses fingerprint diff (already implemented in S44m). Wires forge-core's convergence-audit into the research FSM.

named by · Grok · Gemini · Gemma

Guard · outlier_preserved

outlier_preserved

Confirms outlier responses (high embedding distance from centroid) have been written to .forge/artifacts/outliers/{run_id}-{model}.json with full prompt, response, and distance metric. Required for Pivot transitions.

named by · Grok · Gemini · Gemma

Guard · categorical_review_complete

categorical_review_complete(axes)

Requires per-category attestation (security, concurrency, idempotence, persistence, contract). Each axis gets its own pass/fail signoff. Categorical steering caught bypass classes that generic review missed in S44b.

named by · Grok · Gemini · Gemma

State · PIVOT_EXPLORATION

PIVOT_EXPLORATION

Divergent state. Goal: generate and preserve options. Panel returns all responses; outliers archived. Exit only via explicit ACCEPT_STRATEGY or revision back to STRATEGY_DRAFT.

named by · Grok · Gemini · Gemma

State · GRIND_EXECUTION

GRIND_EXECUTION (sub-state machine)

Convergent state. Wraps EXPERIMENT_RUNNING → EXPERIMENT_SCORING → CONVERGENCE_AUDIT and self-loops on REVISE when new findings surface. Exits to VERDICT only when audit converges.

named by · Gemini · Gemma

State · CONVERGENCE_AUDIT

CONVERGENCE_AUDIT

Self-looping audit state. Re-dispatches panel until convergence_attested + categorical_review_complete + panel_quorum:4. Cap at 4–5 rounds, fall back to human gate.

named by · Grok · Gemini · Gemma

Dispatch · panel.execute

panel.execute(config, prompt)

Core primitive. Reads PanelConfig from manifest, dispatches N concurrent requests via Promise.allSettled, returns array of {model, response, error}. Replaces 1-in-1-out single dispatch.

named by · Grok · Gemini · Gemma

Dispatch · steerByCategory

steerByCategory(responses, axes)

Tags raw panel output by category via cheap classification call. Transforms unstructured text into CategorizedOutput for the categorical_review_complete guard.

named by · Gemini · Grok

Dispatch · detectConvergence

detectConvergence(history, stable_rounds)

Deep set comparison of structured findings across last N rounds. Returns {attested: bool, findings_diff, round}. Implementation already exists in forge-core/src/convergence/fingerprint.ts.

named by · Grok · Gemini

Dispatch · routeByMode

routeByMode(prompt, mode)

Prompt pre-processor. pivot → "Propose three competing alternatives. Emphasize unconventional. Do not seek consensus." grind → "Provide a single optimal solution. List failure modes. Prioritize correctness."

named by · Gemini · Gemma

Dispatch · preserveOutliers

artifact.preserveOutliers(responses)

Compute embeddings → centroid → cosine distance per response → write outliers (>2σ) to .forge/artifacts/outliers/ with prompt, response, distance. Updates FSM context for the outlier_preserved guard.

named by · Gemini · Grok

Three voices · condensed

The headline finding from each voice. Full transcripts at .forge/proposals/hypernym-forge-track/08-r6-{grok,gemini,gemma}.txt.

Grok · CTO / adversarial

"Forces sequential 1-model bottlenecks; auto-loop converges prematurely on weak single-model outputs"

Argues the cost of inaction is escape of latent defects (~30% in his estimate). Pushes for full panel dispatch + categorical steering as parallel primitives.

Replace per-call single-model dispatch with panel_dispatch(roles, models, mode)
Manifest roles become composite: panel_review with categories
Pivot threshold: if semantic diff > 0.2 between models → archive outlier branch
Multi-round protocol: max_rounds:4 with disambiguation prompts

Gemini · synthesizer / FSM specialist

"Mode blindness uses the wrong tool for every job. Convergence audit is non-optional."

Provides a complete research.yaml rewrite with sub-state machines and panel quorum guards. Strong on the FSM as a graph, not a chain.

Sub-state machine: GRIND_EXECUTION contains RUNNING → SCORING → CONVERGENCE_AUDIT
Audit self-loops on REVISE when new findings; exits on CONVERGED
Verdict no longer single-model — it's the result of converged audit
Outlier preservation = embedding distance from centroid > 2σ → JSON artifact

Gemma · formalist · minimum-viable-physics

"Pivot blindness lobotomizes the research capability. Convergence loops are the cure."

Same top-3 framing. Names "Bypass Class" and "Ghost Bug" as the two failure modes the current FSM actively produces.

PanelDispatcher(role, mode, axes) — Pivot fans N seeds, Grind fans N personas
Manifest evolves: { role: review, mode: PIVOT, axes: [concurrency, idempotence] }
Convergence wiring: emit FSM event audit_complete from loop callback
Verdict guard: convergence_attested AND mode == GRIND

Codex · operator's correction

Codex spent the round inspecting forge-core/src/convergence/ instead of producing structured output. Same Pivot signal as Round 5 — and this time it points at the load-bearing primitive.

Codex · operator-class signal

The convergence-audit machinery already exists. It's just wired to the wrong FSM.

Codex read forge-core/src/convergence/fingerprint.ts, forge-core/src/convergence/orchestrator.ts, the round-prompt builder, the canonical-source verification rules, the final-verdict resolver. None of this came up because Codex was generating ideas — it came up because this code is the answer.

What S44l shipped: a convergence-audit skill with fingerprinted findings, multi-round panel attestation, canonical-source-verification requirements, finding fix-status tracking, codex-tiebreaker for split panel verdicts. The skill is enforced by an FSM guard (convergence_attested) — but only on the sprint FSM, in CODE_REVIEW.

The research FSM — research.yaml, swarma.ts, loop.ts — does not call any of this. The same convergence-audit pattern that makes sprint code review trustworthy is invisible to the research auto-loop. The R6 upgrade is wiring, not invention.

Operator translation

forge-core/src/convergence/fingerprint.ts already implements detectConvergence(history)
forge-core/src/convergence/orchestrator.ts already runs the multi-round protocol
buildRoundPrompt() already does categorical steering and canonical-source verification
The R6 upgrade: 1) Add panel_quorum + outlier_preserved + categorical_review_complete + convergence_attested guards to research.yaml. 2) Add panel.execute + routeByMode + preserveOutliers to swarma.ts. 3) Replace per-call dispatch in loop.ts with panel calls reading PanelConfig from the manifest. 4) Wire the existing orchestrator.ts as the CONVERGENCE_AUDIT state handler.

New outliers · Pivot mode preserved

One zero-to-one upgrade idea per voice — preserved per Pivot mode, not voted away.

Grok · outlier

Eureka sidechannel as first-class Pivot primitive

On outlier detection, auto-inject preserved artifacts into Eureka's hypothesis graph (.forge/eureka/hypotheses.yaml). Dispatch solo Eureka solver for zero-shot refinement without panel convergence.

2nd-order

Federates Pivot divergences across loops — Round 6 alt from Codex refines Round 3 persistence hypothesis. Cross-sprint synthesis becomes the substrate for stack-defining primitives.

Falsify

If Eureka injection yields <10% refined outliers adopted in Grind, revert to side-channel.

Gemini · outlier

Antithetical Sub-Loop Forking

When loop.ts archives an outlier, it forks the FSM into a parallel research sub-loop seeded with: "Assume the antithetical thesis is correct and the main thesis is flawed. Generate a competing strategy from first principles."

2nd-order

Single-threaded hill-climb → parallel competitive portfolio. Real machine resources dedicated to contrarian hypotheses. Forces the system to actively prove its own main thread wrong — a "Red Team" at the machine level.

Falsify

Track fork success rate. If >90% of antithetical forks are pruned within 2 states without producing novel high-value artifacts, signal-to-noise too low.

Gemma · outlier

Entropy-driven structural morphing

Compute panel-output entropy each round; freeze stable model fragments, melt unstable ones for re-dispatch. Training-style annealing applied to the panel itself. (Echoes Gemma's R5 "Algorithmic Thermodynamics" outlier — same primitive, applied to dispatch.)

2nd-order

Panel becomes self-regulating — high-entropy roles get more rounds automatically; low-entropy roles get one round and ship. No more hand-tuned max_rounds.

Falsify

System stabilizes its own loop count without human intervention in a noisy environment within 3 sprints of telemetry.

The single answer to the question

"Does the research auto-loop need upgrades to match how we've been working in panels?"

Round 6 · binding answer

Yes — and the upgrade is mostly wiring, not invention.

The research auto-loop today enforces a 1-model-per-role world. The panel pattern we've actually been running is multi-model, mode-aware, categorical, convergence-audited, outlier-preserving. All four primitives the panel converged on already exist somewhere in the codebase — convergence-audit in forge-core/src/convergence/, panel dispatch implicit in the convergence skill, categorical steering in buildRoundPrompt. They are wired into the sprint FSM (CODE_REVIEW guard convergence_attested) but invisible to the research FSM. The R6 upgrade is roughly a 2-sprint job: rewrite research.yaml with mode-routed states + four new guard types, replace per-call dispatch in loop.ts with panel.execute, add preserveOutliers + routeByMode to swarma.ts, and call the existing orchestrator as the CONVERGENCE_AUDIT state handler. Once shipped, the research track converges on the same intelligence floor as code review — and the cross-track product roadmap (Hypernym × Forge) inherits it for free.