Phase 6 — Development-phase Documentation Generator¶
Status: design (2026-04-24) — future TODO captured from the scope-expansion conversation. Scope: roadmap Weeks 18-24; milestones M6.1 - M6.9. Mission shift: pccx-lab stops being “a tool that runs at the end of a verification cycle” and becomes “a tool the developer runs continuously throughout the dev cycle”. Writing the SV should be enough; the docs write themselves.
1. Why — the expanded mandate¶
Phases 1-5 positioned pccx-lab as a verification / analysis / evolve tool — the thing you run after the RTL is written to check it, profile it, explain it, or refine it. Phase 6 expands the mandate in two directions:
Into the dev phase. The developer writes SystemVerilog + inline doc comments; pccx-lab emits documentation at the quality level of the pccx main site (Sphinx / Furo / MyST) without further editorial work. The developer stays in the editor. Docs are a side effect of writing clean, commented code.
Universal ingest. The pipeline is not pccx-specific. Any well- commented SystemVerilog codebase can be fed in and a pccx-style docs site comes out — block diagrams, FSM state charts, ISA PDFs, register maps, all auto-generated. “Port this SV project to a documented site” reduces to
pccx-lab docs emit <path>.
The scale target for the analytics side expands in the same breath: raw-mode traces can legitimately reach tens to hundreds of GiB (full cycle-by-cycle + waveform + bus + cache state dump), so the reader / streaming story has to handle that without loading the whole thing into RAM.
2. Architecture¶
SV source + docstrings + ISA TOML + API TOML
│
▼
┌─────────────────────────────┐
│ pccx-authoring (extended) │ ← tree-sitter-verilog AST + doc
│ SvIngest / CommentExtract │ comment block extraction
└──────────┬──────────────────┘
│
│ (typed IR: Module, Port, Param, FSM, Opcode, …)
│
┌─────────┼─────────────────────────────────────────┐
▼ ▼ ▼
Sphinx / TikZ / Mermaid diagram xelatex ISA PDF
MyST ─▶ generator (block, FSM, bus topology) (mirrors pccx
pages │ main ISA.pdf)
│ │
└───┬─────┴── Chrome-trace plots / Jupyter cells ───┐
│ │
▼ ▼
Static site `pccx-isa-vXXX.pdf`
(/en + /ko) (ready to publish)
Doc-comment convention — the ingest layer reads block comments formatted as (proposal):
/*! \module ctrl_npu_frontend
* AXI-Lite command decoder for the NPU top-level.
*
* \diagram block
* \fsm decode_state
* \cycle 1 per AXIL beat
*/
module ctrl_npu_frontend (...);
\diagram block / \fsm name / \cycle ... / \port / \param are
the seed of the doc-comment vocabulary. Exact schema lands at M6.1.
3. Renderer trio¶
Three output backends share the same typed IR so the same code documents three surfaces:
Backend |
Output |
Lives in |
|---|---|---|
Sphinx / MyST |
Static HTML site (EN + KO) |
|
TikZ / Mermaid |
|
new: |
xelatex |
Standalone |
new: |
Isolation of the PDF backend into its own crate keeps the TeX toolchain dep optional — users on systems without xelatex get HTML + diagrams only and the PDF target skips silently.
4. Milestones¶
M6.1 — SV parser + docstring extraction (Week 18)¶
Add
tree-sitter-verilogdep topccx-authoring.Extract module / port / param / always_ff / FSM declarations into typed IR.
Define the docstring grammar:
\module/\port/\param/\diagram/\fsm/\cycle/\see.Emit a
pccx-lab docs check <file.sv>CLI that validates docstrings and reports missing / malformed blocks.
M6.2 — Block diagrams (Week 19)¶
Walk the module hierarchy, emit a TikZ or Mermaid block diagram per
\diagram blockannotation.Default layout via a minimal placer (rectangle-packing on boundary counts); manual layout hints via
\layout left | right | ….Respect the
pccx-diagram-rulesskill: CUDA / Intel-SDM monochrome palette, no gradients, fixed block-type colour classes.
M6.3 — FSM state diagrams (Week 19)¶
Visitor over
always_ff @ (posedge clk)bodies with a case-over-state-enum pattern detect.Render each
\fsm <name>as a Mermaid stateDiagram-v2 or a native TikZdot2texchain.Auto-detect dead states, flag them in the doc-check output.
M6.4 — ISA PDF pipeline (Week 20)¶
Consume the
pccx-authoringISA TOML compiler’s output.Drive xelatex against a mirror of the pccx main
main.texlayout (shared skill:pccx-isa-preprint).Output:
_build/pdf/<project>-isa-v<N>.pdf.CI target:
make isa-pdfin the repo’sdocs/Makefile.
M6.5 — Universal SV porting (Week 21)¶
pccx-lab docs port <existing-sv-project>— migration helper that scans an unannotated codebase, generates adocstrings.tomlskeleton with empty\module/\portblocks for the developer to fill in, and emits a baseline docs tree so the firstdocs emitrun produces something coherent even before annotation.Ships with a
pccx-labself-port as the canonical example.
M6.6 — Raw analytics mode (Week 22)¶
--rawflag on the analytics CLI emits full cycle-by-cycle + waveform + bus + cache state into a streamable container (Zstd-compressed columnar format).Expected output size: tens to hundreds of GiB on realistic transformer-decode traces. The raw container MUST be mmap-readable so downstream tools (Jupyter, polars, duckdb) can query without loading.
A companion
pccx-raw-readerbinary streams rows matching a cycle / core / event filter without materialising the whole file.
M6.7 — CLI / GUI parity (Week 23)¶
Guarantee: every CLI subcommand has a GUI affordance, and every GUI action dispatches the same underlying Tauri command that the CLI invokes (shared
pccx_cli::cmddispatcher inpccx-core).Contract test in CI: a
parity_checkintegration test that enumerates both surfaces and fails on any asymmetry.Command palette in pccx-ide surfaces CLI names directly.
M6.8 — AI-augmented prose pass (Week 23)¶
After ingest, a Sonnet pass proposes prose around the extracted structure — “this module is the AXI-Lite command decoder feeding the NPU top-level; stalls are expected when the command queue is full because …” — gated on the developer accepting each proposal inside pccx-ide (no silent prose injection).
Prose proposals diffed against existing docstrings so nothing is ever overwritten without consent.
M6.9 — Release — v0.6 dev-phase bundle (Week 24)¶
Ship
pccx-lab docs emit/docs check/docs portas first- class binary commands.Ship the Phase 6 handbook page describing the docstring schema and a worked example.
Example: re-run the whole pipeline against pccx-FPGA-NPU-LLM-kv260
pccx-lab itself, publish the outputs, use them as the v0.6 launch artefact.
5. Scale / resource notes¶
Raw analytics output: tens to hundreds of GiB per long-running trace. Zstd columnar → ~10-20x compression typical, still GiB- scale. Storage is user-provisioned; pccx-lab never assumes disk beyond what the user points at.
AI prose passes cost the Sonnet budget — expect ~1-5 K tokens per module for non-trivial bodies; cache by AST hash.
Sphinx full-site rebuild: target < 30 s on a 100-module codebase via incremental build + sphinx-autobuild.
6. CLI / GUI parity principle¶
The parity target is stronger than “the GUI can do everything the CLI can”. It is bidirectional:
Any CLI invocation is a single-line equivalent of a GUI flow, with deterministic output suitable for scripting.
Any GUI interaction prints (and copies to clipboard) the CLI invocation that would replay it exactly — “what did I just do?” is always answerable with a one-liner.
This is the affordance that makes pccx-lab trustworthy as a dev tool: everything the human does in the IDE, the CI can repeat.
7. Out of scope¶
Synthesis. Compile / elab / place-and-route stays in the RTL repo; Phase 6 reads SV but does not synthesise it.
pccx-main site editorial control. Phase 6 generates pccx-style docs for the user’s own project; pccx’s editorial site keeps its hand-authored pages.
PDF typography beyond xelatex’s native capabilities. No custom InDesign-grade layout.
Live HDL simulation inside the docs generator. Traces come in through the existing
.pccxformat; Phase 6 does not run a sim.
8. Dependencies¶
Existing (from earlier phases):
pccx-authoring— ISA / API TOML compilers; extended here for SV ingest.pccx-reports— MD / HTML / PDF rendering plumbing extended for Sphinx-driven output.pccx-lsp— comment-extraction seam reuses the provider trait pipeline for incremental doc-gen as the developer types.pccx-ide— GUI affordance surface; CLI parity contract lives here.
New:
tree-sitter-verilog— fast incremental SV parser.Either
typstor an extendedxelatextoolchain (via thepccx-isa-preprintskill’s pipeline).zstd/ columnar writer (arrow-rsor a minimal bespoke format) for raw analytics.
9. Open questions¶
Docstring vocabulary — Doxygen-style
\modulevs Rust-style/// #[module]vs a fresh DSL. Decide at M6.1 review.Mermaid vs TikZ primary. Mermaid is friendlier for HTML / onboarding; TikZ is better for print. Emit both from the same IR and let the renderer pick per-output.
ISA TOML vs directly parsed SV
enum. Todaypccx-authoringis TOML-driven; some projects define opcodes inline in SV. M6.4 needs to read both.Auto-emit cadence. Run on every save (expensive) vs on-demand (staler) vs git pre-commit hook (middle ground). Ship all three behind a config flag.
Drafted 2026-04-24 from scope-expansion conversation. See
docs/design/rationale.md for the module-boundary implications: this
phase extends existing crates rather than adding new workspace members,
so the 10-crate layout is unchanged. Only crates that gain surface
are pccx-authoring (SV ingest), pccx-reports (Sphinx driver) and
pccx-ide (CLI/GUI parity contract).