Phase 5 — AlphaEvolve → OpenEvolve¶
Status: scaffold landed (pccx-evolve trait scaffolds + speculative); implementation kicks off in Phase 5 proper.
Scope: roadmap Weeks 19-30 (±4 weeks uncertainty band); milestones 5A, 5B, 5C + user-requested 5D, 5E.
Thesis: Fix the weaknesses of existing AlphaEvolve-style systems by combining LLM + Reinforcement Learning + Formal Methods + Surrogate Models.
1. Architecture overview¶
┌─────────────────────────────────────────────────┐
│ pccx-evolve │
│ │
User spec ──▶│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │── accepted
│ │ LLM prop. │──▶│ PRM gate │──▶│ Surrogate│ │ candidate
│ │ (Sonnet) │ │ (fast) │ │ (GNN) │ │──▶
│ └──────────┘ └──────────┘ └──────────┘ │
│ ▲ │
│ │ evolutionary loop (mutate+cross) │
│ └───────────────────────────────────────┤
│ │
│ Sail refinement check (pccx-verification) │
│ Formal property check (Lean 4) │
└────────────────────────────────────────────────┘
Five lanes:
Lane |
Input |
Output |
Audience |
|---|---|---|---|
5A Chip DSE |
RTL + target |
RTL variants (Pareto front) |
HW engineer |
5B Compiler DSE |
High-level code |
LLVM pass order + RL’d alloc |
SW engineer |
5C OS/Kernel formal |
Kernel C |
Proven kernel module |
Systems engineer |
5D Model → API |
HF model + spec |
Target-specific driver code |
AI researcher |
5E Model → RTL |
HF model + spec |
Custom NPU RTL + proof |
Chip architect |
5D + 5E are user-requested on 2026-04-24 and build on top of 5A + 5C.
2. Milestones¶
5A — Chip Design Space Exploration (Weeks 19-22)¶
Problem: RTL design space is enormous (instruction width, opcode encoding, DSP cluster size, pipeline depth).
Solution:
Surrogate — GNN on RTL AST predicts area / power / delay / fmax without synthesis. Trained on ~10 K historical Vivado runs from
pccx-FPGA-NPU-LLM-kv260. Target latency: < 10 ms / query.Evolutionary loop — population = RTL variants, fitness = surrogate prediction + Verilator pass + verible-lint pass + timing-sanity check.
PRM gate — deep cloud LLM proposes RTL → Verilator elaborates → verible lints → timing-check sanity-tests → survivors go to the surrogate.
Formal diff — promoted variants must pass
pccx-verification::GoldenDiffGate+ Sail refinement check.
Deliverable: “design an NPU for Gemma-3N E4B decoding at 20 tok/s on KV260” in < 1 day wall-clock.
5B — Compiler Superoptimization (Weeks 22-24)¶
Problem: -O3 leaves performance on the table. Register allocation + instruction scheduling are NP-hard, so compilers use heuristics.
Solution:
MCTS over LLVM pass orderings. Reward = measured runtime on the target (or cycle-accurate sim on TinyNPU).
GNN + RL for register allocation and instruction scheduling. Policy network ingests the data-flow graph; actions are “assign register R to virtual V”. Reward = -pipeline-stalls - register-pressure.
Compiler explainer — post-run, Sonnet narrates why the found pass order beats
-O3in terms the developer understands.
Deliverable: AI-compiled kernel beats hand-tuned expert kernel on ≥ 3 benchmarks (matmul, attention, layer-norm).
5C — OS / Kernel Formal Co-Design (Weeks 24-27)¶
Non-negotiable: stability > everything.
Hybrid architecture:
LLM drafts kernel module / driver / scheduler.
Feed to Lean 4 theorem prover (extract proof obligations automatically).
Prove: no memory leaks, no deadlocks, mutex correctness, scheduler starvation-free.
On failure: return counter-example trace to LLM → propose fix → re-prove. Iterate until mathematically correct.
Start narrow: reuse seL4-style libraries; don’t reinvent formal primitives.
Deliverable: pccx-NPU driver with signed Lean 4 correctness proof bundled.
5D — Model → ISA-API Compiler (Weeks 27-30) USER BIG BET¶
Input: a HuggingFace model (.safetensors + config.json + tokenizer) + pccx ISA spec.
Pipeline:
Parse the model’s computation graph → tensor op sequence.
Map each op to pccx ISA opcodes (Sail spec is the ground truth).
deep cloud LLM generates Rust/C driver code that issues those opcodes in order.
Run pccx-lab simulator against PyTorch reference trace → bit-exact check (or
pccx-verification::GoldenDiffGate).Emit the signed driver + a verification report.
Deliverable: drop a model file in → get a uca_run_<model> function out, validated by the Sail oracle.
Depends on: Phase 4 M4.8-M4.10 Sail completion (for reliable refinement check).
5E — Generative Chip Design (Weeks 30+) USER ULTIMATE GOAL¶
Input: same model file + target silicon family (KV260 / ASIC-22nm).
Pipeline:
Run 5D, inspect the resulting
.pccxtrace, identify bottleneck (compute-bound vs memory-bound).Feed bottleneck + model structure to 5A’s evolutionary loop.
Candidates must pass:
Verilator + verible-lint (PRM gate),
Surrogate Pareto threshold (area / power / fmax),
Sail refinement (every ISA op behaves equivalently to the spec),
Formal property check (every pccx invariant holds — e.g. “no MAC overflow”).
Synthesize top-K survivors in parallel via 5C-authorised Vivado runners.
Pick Pareto front; user selects final.
Deliverable: feed gemma-3n-e4b.safetensors + “KV260” → receive a tailor-made NPU RTL + bitstream + correctness proof in < 48 hours.
3. Risk register¶
Risk |
Likelihood |
Impact |
Mitigation |
|---|---|---|---|
Surrogate accuracy poor on out-of-distribution designs |
Medium |
High |
Keep the “truth” escape valve — any variant whose predicted metrics diverge > 20% from actual synth gets the surrogate retrained on it. |
Lean 4 proof obligations auto-extraction brittle |
Medium |
High |
Start with known-provable kernel modules; expand only as tooling matures. |
Model input shapes outside pccx v002 ISA capacity |
Low |
High |
Surface as a compile-time error in 5D; fall back to CPU reference path. |
Sonnet RTL proposals generate linter-clean but timing-broken candidates |
High |
Medium |
PRM gate does static timing sanity check (critical-path estimate) before the surrogate. |
5E wall-clock target (48 h) unachievable on KV260 workstations |
High |
Medium |
Offload synth to a cloud Vivado cluster; document the trade-off. |
4. Decision — internal first, open later¶
Phase 5.0 Gate: use the engine on pccx-lab’s own RTL and kernel for the first 3 months before exposing publicly. Rationale:
Proves value on code we understand.
Surfaces infra bugs before customer exposure.
Generates training data for the surrogate.
Establishes a credible launch story (“we used it to build ourselves”).
Open to external users once:
Surrogate accuracy ≥ 90% on PCCX-Lab’s internal benchmarks.
Formal gate signs off ≥ 3 non-trivial kernel modules.
5D succeeds on ≥ 3 third-party models (Gemma 3N, Llama-2, BERT).
Target public release: pccx-lab v0.5 (roughly Q1-2027 at current cadence).
5. Token budget¶
Surrogate queries: 0 LLM tokens (pure inference).
PRM gate: 0 LLM tokens (static analysis only).
LLM mutation proposals: Haiku (500-1 K tokens/mutation; thousands/day).
LLM final-round refinement: Sonnet (2-5 K tokens/candidate; tens/day).
LLM Lean 4 repair (5C): Sonnet/Opus (5-20 K tokens/iteration; hundreds/week).
LLM concept-to-RTL narration (5D/5E): Opus (10-50 K tokens/session; dozens/week).
Cache all narrations by (input hash, prompt template hash) — 60%
hit rate target in steady state.
6. Dependencies on earlier phases¶
Phase 1 scaffold — done (pccx-evolve traits landed).
Phase 2 M2.6 (target-aware suggestions) — feeds FPGA presets into 5A.
Phase 3 M3.4 (sandboxed sessions) — runs Vivado in isolation.
Phase 4 M4.5 (what-if engine) — visualises 5A’s Pareto front.
Phase 4 M4.8-4.10 (Sail finale) — refinement oracle for 5D/5E.
Don’t start 5E before all of the above land.