Phase 4 — Insane-Level Reports & Concept-to-RTL

Status: scaffold landed (pccx-reports trait + MarkdownFormat); implementation kicks off in Phase 4 proper. Scope: roadmap Weeks 14-18; milestones M4.1 - M4.7 (+ M4.8-4.10 Sail finale). Target experience: reports that no other tool can approach — single source of truth for the whole hardware-design lifecycle.

1. What “insane-level” means

Not “prettier markdown”. Every report section is:

  1. Generative — built from a live data model (pccx-core trace + synth + verification result), NOT copy-pasted.

  2. Interactive — waveforms, heatmaps, proof trees respond to user input (zoom, filter, replay counter-example).

  3. AI-annotated — the LLM narrates glitches, hotspots, what-ifs at just enough length to matter.

  4. Comparable — every report links its numbers to the benchmark database so a reviewer sees “17% faster than Llama-2 at same area” at a glance.

  5. Reproducible — PDF / HTML / Jupyter bundle includes the input trace hash + pccx-lab commit SHA so anyone can regenerate.

2. Architecture

pccx-core (trace / synth / roofline / bottleneck)
    │
    ├──▶ pccx-verification (golden-diff / sail-refine / formal)
    │          │
    │          ▼
    └──▶ pccx-reports  [engine]
              │
              ├── MarkdownFormat  (ships in M1.2)
              ├── HtmlFormat      (M4.1)
              ├── PdfFormat       (M4.1)
              ├── JupyterFormat   (M4.1)
              └── WavedromFormat  (M4.2 — interactive waveforms)

All formats implement the same ReportFormat trait. The engine composes a Report document tree; each format walks the tree and emits its target bytes.

2.1 Document tree

struct Report {
    sections: Vec<Section>,
    metadata: ReportMeta,   // trace hash, pccx-lab SHA, benchmarks seen
}

enum Section {
    Summary(AiNarration),                    // LLM-written exec summary
    Waveform(WaveformRef),                    // M4.2
    Heatmap { kind: HeatmapKind, data:  },   // M4.3
    FormalProof(ProofTreeRef),                // M4.4
    WhatIf(ScenarioRef),                      // M4.5
    ConceptToRtl(DesignSpecRef),              // M4.6
    BenchmarkCompare(BenchmarkRef),           // M4.7
    Raw(String),                              // escape hatch
}

3. Milestones

M4.1 — Template engine (Week 14)

  • Data-driven rendering: the Report tree is the only source of truth; formats are pure functions of it.

  • HtmlFormat uses Askama (compile-time templates, no runtime interpreter).

  • PdfFormat uses WeasyPrint (CSS -> PDF, Python dep) OR pdf-writer (Rust-native). Pick after benchmarking.

  • JupyterFormat emits a notebook whose cells embed the live trace fetcher so readers can re-run the analysis interactively.

M4.2 — Interactive waveform viewer (Week 15)

  • WaveDrom extended with AI annotations: hover a glitch, see Sonnet’s one-sentence explanation (“likely caused by clock-crossing on CDC_A”).

  • Annotations stored in the trace itself (new .pccx payload field) so they’re first-class, not ephemeral.

M4.3 — Power / area / timing heatmaps (Week 15)

  • Vivado / OpenROAD report ingestion via pccx-core::synth_report.

  • D3.js heatmap in pccx-ide; hover a tile for LLM hotspot narration (“DSP48E2 cluster 3,7 is 92% utilised — consider dual-pumping”).

  • Export: SVG (static) + JSON (interactive source).

M4.4 — Formal proof visualiser (Week 16)

  • Coq / Lean 4 proof trees (from pccx-verification sail-refine gate) rendered as expandable trees.

  • Counter-example replay: if a proof fails, the visualiser animates the failing input through the Sail model so the user sees exactly where the RTL and spec diverge.

M4.5 — “What if?” scenario engine (Week 17)

  • Take the current trace + synth report, ask “what if we raised fmax to 450 MHz?” or “what if we halved URAM?”.

  • pccx-evolve surrogate model predicts new area/power/delay in < 10 ms.

  • Side-by-side diff report against the baseline.

M4.6 — Concept-to-RTL flow (Week 17-18, flagship demo)

  • User types natural-language spec (“I want a 16-lane INT8 GEMV that fits in a KV260”).

  • Agent team (research + doc drafting subagents) proposes an architecture.

  • pccx-authoring emits a first-pass ISA + RTL skeleton.

  • User reviews, tweaks, rebuilds.

  • Total latency goal: spec in, buildable RTL out, < 30 minutes.

M4.7 — Benchmark database (Week 18)

  • Track Gemma-3N E4B, Llama-2 7B, BERT-base as comparison baselines.

  • Nightly CI re-runs them on the current commit so the database stays honest.

  • Report’s “vs peers” section auto-pulls from this database.

M4.8 - M4.10 — Sail finale (Week 18, bonus)

  • M4.8: Sail execute semantics 2nd increment — concrete MAC / DMA / SFU effects.

  • M4.9: Sail → .pccx trace emitter (replace record_event stub).

  • M4.10: Sail ↔ RTL refinement diff plugged into pccx-verification as a first-class VerificationGate.

4. Quality bar

Reports must hold up next to:

  • Synopsys DSO.ai’s design insight dashboard

  • Vivado Design Hub’s post-implementation report

  • NVIDIA Nsight Compute’s kernel profile

Specifically: every number has a source link; every claim has an explanation; every “insane” AI narration is < 80 words and anchored to a concrete data point.

5. Token budget

  • Summary narrations (Haiku): 500 tokens/report.

  • What-if scenarios (Sonnet, batched): 1.5 K tokens per 10 scenarios.

  • Concept-to-RTL (Opus): 4-8 K tokens per session; amortised across the generated artefact.

  • Report re-renders are 0-token (format is pure function of tree).

  • Cache AI narrations by (pccx-lab SHA, trace hash, section kind).

6. Non-goals

  • Live collaborative editing of the report (Google-Docs-style). Reports are snapshots, not documents.

  • 3D chip floor-plan visualisation. 2D heatmaps are sufficient for the Phase 4 audience.

  • Marketing-grade PDF typography. IEEE-paper tone (see pccx-plotting-rules skill for chart style).

7. Open questions

  • PdfFormat backend: WeasyPrint (Python dep) vs pdf-writer (pure Rust, fewer features). Decide at M4.1 based on HTML → PDF fidelity.

  • Benchmark database storage: in-tree (growing JSON) vs separate pccx-bench repo. Recommend separate repo to avoid pccx-lab bloat.

  • Concept-to-RTL UX: chat interface vs forms + preview. Prototype both during Week 17, user test before Week 18 lock-in.