Compile-Priority Packages¶
The v002 RTL organises packages and headers into a compile-priority hierarchy
(Constants/compilePriority_Order/). Each tier depends only on the tier above
it, allowing xvlog and the Vivado compiler to process them in order without
forward references. The canonical ordering is recorded in
hw/vivado/filelist.f.
Compile Priority¶
Compilation proceeds in tiers A → B → C → D.
A — Pure `define headers (A_const_svh/): Added to the include search
path via the xvlog -i flag; they are not compiled directly.
NUMBERS.svh— Primitive bit-width constants.`N_SIZEOF_INT4 = 4,`N_BF16_SIZE = 16,`N_FP32_SIZE = 32.npu_arch.svh— NPU architectural macros. Holds values that must be`definerather thanlocalparambecause they appear in port declarations andgenerateranges. Key constants:`ARRAY_SIZE_H = 32,`ARRAY_SIZE_V = 32,`ISA_WIDTH = 64,`ISA_BODY_WIDTH = 60,`SYSTOLIC_TOTAL_LATENCY = 64,`FMAP_CACHE_DEPTH = 2048,`FIXED_MANT_WIDTH = 27.kv260_device.svh— Physical hardware constants for the Kria KV260 target. HP port count 4, HP single-port width 128 bits, DSP48E2 A/B/P port widths 30/18/48 bits, XPM FIFO depths 512/16.GLOBAL_CONST.svh— Deprecated shim. Re-includesNUMBERS.svh,kv260_device.svh, andnpu_arch.svhin sequence, then adds legacy aliases (HP_PORT_MAX_WIDTH,DSP48E2_POUT_SIZE, etc.). New code must include the source headers directly rather than using this file.DEVICE_INFO.svh— Deprecated shim. Re-includeskv260_device.svhand provides legacy aliases (DEVICE_HP_SINGLE_LANE_MAX_IN_BIT,DEVICE_HP_CNT). Do not use in new code.
B — device_pkg.sv (B_device_pkg/): Depends on A headers. The first
SystemVerilog package; holds algorithm-level type choices.
C — dtype_pkg.sv, mem_pkg.sv (C_type_pkg/): Depend on B. Separate
numeric-type constants from memory-architecture parameters.
D — vec_core_pkg.sv (D_pipeline_pkg/): Depends on A + B + C. Defines
the Vector Core configuration struct and default values.
Core Packages¶
device_pkg (B_device_pkg/device_pkg.sv)
Fixes the feature-map (activation) port precision at BF16 (16 bits) and the
internal accumulation precision at FP32 (32 bits). Declares pipeline instance
counts VecPipelineCnt = 4 and MatPipelineCnt = 1. All downstream packages
reference these values, making this file the first edit point for any
architecture-level change.
package device_pkg;
// ===| Feature Map (Activation) Type |========================================
// FmapType : port-level precision — BF16 (16-bit)
// FmapTypeMixedPrecision: internal accumulation — FP32 (32-bit)
localparam int FmapType = `N_BF16_SIZE;
localparam int FmapTypeMixedPrecision = `N_FP32_SIZE;
// ===| Weight Type |===========================================================
// INT4: 4-bit quantized weight, streamed from HP ports
localparam int WeightType = `N_SIZEOF_INT4;
// ===| Pipeline Instance Counts |==============================================
localparam int VecPipelineCnt = 4; // 4 x muV-Core (Vector Core)
localparam int MatPipelineCnt = 1; // 1 x Matrix Core (32x32 systolic)
// ===| Legacy aliases (snake_case) — keep until all RTL refs updated |=========
localparam int GemvPipelineCnt = VecPipelineCnt;
localparam int GemmPipelineCnt = MatPipelineCnt;
dtype_pkg (C_type_pkg/dtype_pkg.sv)
Exposes BF16 (Bf16Width = 16, Bf16ExpWidth = 8, Bf16MantWidth = 7),
fixed-point mantissa width FixedMantWidth = 27, INT4 (Int4Width = 4),
INT8 (Int8Width = 8), FP32 (Fp32Width = 32), and DSP48E2 P-register width
DspPWidth = 48 as localparam constants. No units or semantics are embedded,
so the file remains valid across synthesis targets.
mem_pkg (C_type_pkg/mem_pkg.sv)
Consolidates memory parameters derived from device_pkg and kv260_device.svh.
HpPortCnt = 4, HpSingleWidthBit = 128, HpTotalWidthBit = 512,
HpSingleWeightCnt = 32, FmapL2CacheOutCnt = 32 (= ARRAY_SIZE_H),
FmapCacheDepth = 2048, XpmFifoDepth = 512. No magic numbers — all values
are derived from upstream headers.
vec_core_pkg (D_pipeline_pkg/vec_core_pkg.sv)
Defines the packed struct vec_cfg_t describing the Vector Core topology, and
the KV260 default VecCoreDefaultCfg. Key defaults: GemvBatch = 512,
GemvCycle = 512, GemvLineCnt = 32. The legacy alias gemv_cfg_t is retained
until all GEMV_*.sv modules migrate to the renamed type.
Interface Headers¶
npu_interfaces.svh (NPU_Controller/npu_interfaces.svh)
Includes GLOBAL_CONST.svh then defines two SystemVerilog interfaces.
axis_if #(DATA_WIDTH = 128) — AXI4-Stream bus. The slave modport takes
tdata, tvalid, tlast, tkeep as inputs and drives tready. The master
modport reverses direction.
axil_if #(ADDR_W = 12, DATA_W = 64) — AXI4-Lite control bus. The interface
itself takes clk and rst_n as ports. The slave modport takes AW/W/AR
channels as inputs and drives B/R channels.
interface axis_if #(
parameter DATA_WIDTH = 128
) ();
logic [ DATA_WIDTH-1:0] tdata;
logic tvalid;
logic tready;
logic tlast;
logic [(DATA_WIDTH/8)-1:0] tkeep;
// Slave Side (NPU Perspective: Input)
modport slave(input tdata, tvalid, tlast, tkeep, output tready);
// Master Side (NPU Perspective: Output)
modport master(output tdata, tvalid, tlast, tkeep, input tready);
endinterface
// axil_if.sv
interface axil_if #(
parameter int ADDR_W = 12,
parameter int DATA_W = 64
) (
input logic clk,
input logic rst_n
);
// AW Channel
logic [ADDR_W-1:0] awaddr;
logic [ 2:0] awprot;
logic awvalid, awready;
// W Channel
logic [ DATA_W-1:0] wdata;
logic [(DATA_W/8)-1:0] wstrb;
logic wvalid, wready;
// B Channel
logic [1:0] bresp;
logic bvalid, bready;
// AR Channel
logic [ADDR_W-1:0] araddr;
logic [ 2:0] arprot;
logic arvalid, arready;
// R Channel
logic [DATA_W-1:0] rdata;
logic [ 1:0] rresp;
logic rvalid, rready;
modport slave(
input awaddr, awprot, awvalid, wdata, wstrb, wvalid, bready,
input araddr, arprot, arvalid, rready,
output awready, wready, bresp, bvalid, arready, rdata, rresp, rvalid
);
modport master(
output awaddr, awprot, awvalid, wdata, wstrb, wvalid, bready,
output araddr, arprot, arvalid, rready,
input awready, wready, bresp, bvalid, arready, rdata, rresp, rvalid
);
endinterface
GEMM_Array.svh (MAT_CORE/GEMM_Array.svh)
A compatibility shim that re-includes npu_arch.svh to single-source
ARRAY_SIZE_H and ARRAY_SIZE_V, eliminating redefinition warnings. Adds two
MAT_CORE-scoped constants: MINIMUM_DELAY_LINE_LENGTH = 1 and
gemm_instruction_dispatcher_CLOCK_CONSUMPTION = 1. New MAT_CORE modules
should include npu_arch.svh directly.
Usage Matrix¶
The table reflects import statements and `include directives observed
directly in each source file.
Module (core) |
|
|
|
|
|
Header (include) |
|---|---|---|---|---|---|---|
|
— |
— |
— |
— |
— |
|
|
— |
o |
— |
o |
— |
|
|
— |
— |
— |
— |
o |
|
|
— |
— |
— |
— |
— |
|
|
— |
— |
— |
— |
— |
|
o = import confirmed in source. — = no import in that file.
GEMM_systolic_top does not import any package; it uses only `define
macros from the headers. GEMV_top declares import vec_core_pkg::*; and
references dtype_pkg::Bf16ExpWidth in its port list. CVO_top declares
import isa_pkg::*; and import bf16_math_pkg::*; (bf16_math_pkg is a
library package documented on the Shared Library page). ctrl_npu_dispatcher
is currently an entirely commented-out stub and is excluded from the table.
Last verified against
Commit 8c09e5e @ pccxai/pccx-FPGA-NPU-LLM-kv260 (2026-04-29).