Phase 4 — Insane-Level Reports & Concept-to-RTL¶
Status: scaffold landed (pccx-reports trait + MarkdownFormat); implementation kicks off in Phase 4 proper.
Scope: roadmap Weeks 14-18; milestones M4.1 - M4.7 (+ M4.8-4.10 Sail finale).
Target experience: reports that no other tool can approach — single source of truth for the whole hardware-design lifecycle.
1. What “insane-level” means¶
Not “prettier markdown”. Every report section is:
Generative — built from a live data model (pccx-core trace + synth + verification result), NOT copy-pasted.
Interactive — waveforms, heatmaps, proof trees respond to user input (zoom, filter, replay counter-example).
AI-annotated — the LLM narrates glitches, hotspots, what-ifs at just enough length to matter.
Comparable — every report links its numbers to the benchmark database so a reviewer sees “17% faster than Llama-2 at same area” at a glance.
Reproducible — PDF / HTML / Jupyter bundle includes the input trace hash + pccx-lab commit SHA so anyone can regenerate.
2. Architecture¶
pccx-core (trace / synth / roofline / bottleneck)
│
├──▶ pccx-verification (golden-diff / sail-refine / formal)
│ │
│ ▼
└──▶ pccx-reports [engine]
│
├── MarkdownFormat (ships in M1.2)
├── HtmlFormat (M4.1)
├── PdfFormat (M4.1)
├── JupyterFormat (M4.1)
└── WavedromFormat (M4.2 — interactive waveforms)
All formats implement the same ReportFormat trait. The engine
composes a Report document tree; each format walks the tree and
emits its target bytes.
2.1 Document tree¶
struct Report {
sections: Vec<Section>,
metadata: ReportMeta, // trace hash, pccx-lab SHA, benchmarks seen
}
enum Section {
Summary(AiNarration), // LLM-written exec summary
Waveform(WaveformRef), // M4.2
Heatmap { kind: HeatmapKind, data: … }, // M4.3
FormalProof(ProofTreeRef), // M4.4
WhatIf(ScenarioRef), // M4.5
ConceptToRtl(DesignSpecRef), // M4.6
BenchmarkCompare(BenchmarkRef), // M4.7
Raw(String), // escape hatch
}
3. Milestones¶
M4.1 — Template engine (Week 14)¶
Data-driven rendering: the
Reporttree is the only source of truth; formats are pure functions of it.HtmlFormat uses Askama (compile-time templates, no runtime interpreter).
PdfFormat uses WeasyPrint (CSS -> PDF, Python dep) OR
pdf-writer(Rust-native). Pick after benchmarking.JupyterFormat emits a notebook whose cells embed the live trace fetcher so readers can re-run the analysis interactively.
M4.2 — Interactive waveform viewer (Week 15)¶
WaveDrom extended with AI annotations: hover a glitch, see Sonnet’s one-sentence explanation (“likely caused by clock-crossing on CDC_A”).
Annotations stored in the trace itself (new
.pccxpayload field) so they’re first-class, not ephemeral.
M4.3 — Power / area / timing heatmaps (Week 15)¶
Vivado / OpenROAD report ingestion via pccx-core::synth_report.
D3.js heatmap in pccx-ide; hover a tile for LLM hotspot narration (“DSP48E2 cluster 3,7 is 92% utilised — consider dual-pumping”).
Export: SVG (static) + JSON (interactive source).
M4.4 — Formal proof visualiser (Week 16)¶
Coq / Lean 4 proof trees (from pccx-verification sail-refine gate) rendered as expandable trees.
Counter-example replay: if a proof fails, the visualiser animates the failing input through the Sail model so the user sees exactly where the RTL and spec diverge.
M4.5 — “What if?” scenario engine (Week 17)¶
Take the current trace + synth report, ask “what if we raised fmax to 450 MHz?” or “what if we halved URAM?”.
pccx-evolve surrogate model predicts new area/power/delay in < 10 ms.
Side-by-side diff report against the baseline.
M4.6 — Concept-to-RTL flow (Week 17-18, flagship demo)¶
User types natural-language spec (“I want a 16-lane INT8 GEMV that fits in a KV260”).
Agent team (research + doc drafting subagents) proposes an architecture.
pccx-authoring emits a first-pass ISA + RTL skeleton.
User reviews, tweaks, rebuilds.
Total latency goal: spec in, buildable RTL out, < 30 minutes.
M4.7 — Benchmark database (Week 18)¶
Track Gemma-3N E4B, Llama-2 7B, BERT-base as comparison baselines.
Nightly CI re-runs them on the current commit so the database stays honest.
Report’s “vs peers” section auto-pulls from this database.
M4.8 - M4.10 — Sail finale (Week 18, bonus)¶
M4.8: Sail execute semantics 2nd increment — concrete MAC / DMA / SFU effects.
M4.9: Sail →
.pccxtrace emitter (replacerecord_eventstub).M4.10: Sail ↔ RTL refinement diff plugged into pccx-verification as a first-class
VerificationGate.
4. Quality bar¶
Reports must hold up next to:
Synopsys DSO.ai’s design insight dashboard
Vivado Design Hub’s post-implementation report
NVIDIA Nsight Compute’s kernel profile
Specifically: every number has a source link; every claim has an explanation; every “insane” AI narration is < 80 words and anchored to a concrete data point.
5. Token budget¶
Summary narrations (Haiku): 500 tokens/report.
What-if scenarios (Sonnet, batched): 1.5 K tokens per 10 scenarios.
Concept-to-RTL (Opus): 4-8 K tokens per session; amortised across the generated artefact.
Report re-renders are 0-token (format is pure function of tree).
Cache AI narrations by
(pccx-lab SHA, trace hash, section kind).
6. Non-goals¶
Live collaborative editing of the report (Google-Docs-style). Reports are snapshots, not documents.
3D chip floor-plan visualisation. 2D heatmaps are sufficient for the Phase 4 audience.
Marketing-grade PDF typography. IEEE-paper tone (see pccx-plotting-rules skill for chart style).
7. Open questions¶
PdfFormat backend: WeasyPrint (Python dep) vs pdf-writer (pure Rust, fewer features). Decide at M4.1 based on HTML → PDF fidelity.
Benchmark database storage: in-tree (growing JSON) vs separate pccx-bench repo. Recommend separate repo to avoid pccx-lab bloat.
Concept-to-RTL UX: chat interface vs forms + preview. Prototype both during Week 17, user test before Week 18 lock-in.