NPU Frontend Modules¶
RTL source on GitHub
SystemVerilog sources documented on this page:
hw/rtl/NPU_Controller/NPU_frontend/AXIL_CMD_IN.sv— View on GitHubhw/rtl/NPU_Controller/NPU_frontend/AXIL_STAT_OUT.sv— View on GitHubhw/rtl/NPU_Controller/NPU_frontend/ctrl_npu_frontend.sv— View on GitHubhw/rtl/NPU_Controller/npu_interfaces.svh— View on GitHub
Role¶
The NPU frontend is the AXI-Lite entry point at the PS↔PL boundary on the NPU side,
complementing the host driver HAL.
The host driver writes ISA words to address 0x000 and a kick token to 0x008,
then polls completion status over the same interface’s read channel.
AXIL_CMD_IN receives host write transactions and enqueues command words in a FIFO.
AXIL_STAT_OUT returns engine completion status to the host through a FIFO-backed
read path.
In the hierarchy, ctrl_npu_frontend sits above the AXI-Lite bus and below the
controller’s decoder and dispatcher (NPU Controller Modules).
AXIL_CMD_IN¶
AXIL_CMD_IN accepts AXI-Lite write transactions from the host and stores command
words in a synchronous FIFO.
The parameter FIFO_DEPTH (default 8) sets the FIFO capacity in command-word entries.
When the FIFO is full, s_awready deasserts, stalling the host at the AW channel.
Two register addresses are decoded: 0x000 inserts a raw ISA word directly into the
FIFO; 0x008 inserts a kick marker with bit63 = 1.
The kick marker signals a batch boundary to the downstream dispatcher.
A pop occurs each cycle that both OUT_valid and IN_decoder_ready are asserted.
/*─────────────────────────────────────────────
Register Address Map
───────────────────────────────────────────────*/
localparam ADDR_INST = 12'h000;
localparam ADDR_KICK = 12'h008;
AXIL_STAT_OUT¶
The upper module asserts IN_valid alongside IN_data to push a status word into
the STAT_OUT FIFO after each engine completion.
The parameter FIFO_DEPTH (default 8) applies.
When the FIFO is full, pushes are dropped silently.
Completion words carry idempotent information, so a dropped entry is recovered by
the host’s next poll cycle.
On an AR-channel handshake the FIFO head is latched into rdata_r and s_rvalid is
asserted. The response is held until the host acknowledges with s_rready.
s_arready deasserts when the FIFO is empty or when an R response is still in progress.
/*─────────────────────────────────────────────
FIFO (simple synchronous, FIFO_DEPTH entries)
Push : IN_valid from upper module
Pop : AXI4-Lite read handshake with CPU
───────────────────────────────────────────────*/
localparam PTR_W = $clog2(FIFO_DEPTH);
logic [`ISA_WIDTH-1:0] mem[0:FIFO_DEPTH-1];
logic [PTR_W:0] wr_ptr, rd_ptr;
logic fifo_empty, fifo_full;
assign fifo_empty = (wr_ptr == rd_ptr);
assign fifo_full = (wr_ptr[PTR_W] != rd_ptr[PTR_W]) && (wr_ptr[PTR_W-1:0] == rd_ptr[PTR_W-1:0]);
logic fifo_ren;
always_ff @(posedge clk) begin
if (!rst_n || IN_clear) begin
wr_ptr <= '0;
rd_ptr <= '0;
end else begin
// push : upper module feeds status continuously
if (IN_valid && !fifo_full) begin
mem[wr_ptr[PTR_W-1:0]] <= IN_data;
wr_ptr <= wr_ptr + 1'b1;
end
// pop : CPU consumed the data
if (fifo_ren && !fifo_empty) rd_ptr <= rd_ptr + 1'b1;
end
end
/*─────────────────────────────────────────────
AXI4-Lite Read Path
Wait for AR, then pop one entry from FIFO and return it.
Hold rvalid until CPU acknowledges with rready.
───────────────────────────────────────────────*/
logic [`ISA_WIDTH-1:0] rdata_r;
logic rvalid_r;
assign s_rdata = rdata_r;
assign s_rresp = 2'b00;
assign s_rvalid = rvalid_r;
assign s_arready = ~rvalid_r && ~fifo_empty; // ready only when FIFO has data
assign fifo_ren = s_arvalid && s_arready; // pop on AR handshake
always_ff @(posedge clk) begin
if (!rst_n || IN_clear) begin
rdata_r <= '0;
rvalid_r <= 1'b0;
end else begin
// AR handshake → latch FIFO head and assert rvalid
if (s_arvalid && s_arready) begin
rdata_r <= mem[rd_ptr[PTR_W-1:0]];
rvalid_r <= 1'b1;
end
// R handshake → CPU consumed data, release
if (rvalid_r && s_rready) rvalid_r <= 1'b0;
end
end
ctrl_npu_frontend¶
ctrl_npu_frontend is a wrapper that routes the axil_if.slave signals to
AXIL_CMD_IN (write channels) and AXIL_STAT_OUT (read channels).
The CMD_IN outputs cmd_data and cmd_valid are forwarded as OUT_RAW_instruction
and OUT_kick to the downstream controller.
OUT_kick is combinational: cmd_valid & IN_fetch_ready.
The status path accepts IN_enc_stat and IN_enc_valid directly from the encoder FSM.
ctrl_npu_interface.sv is currently a placeholder for future per-core interface
aggregation.
AXIL_CMD_IN #(
.FIFO_DEPTH(8)
) u_cmd_in (
.clk (clk),
.rst_n (rst_n),
.IN_clear(IN_clear), // FIXED: Typo i_clear -> IN_clear
// AXI4-Lite Write channels directly routed from the interface
.s_awaddr (S_AXIL_CTRL.awaddr),
.s_awvalid(S_AXIL_CTRL.awvalid),
.s_awready(S_AXIL_CTRL.awready),
.s_wdata (S_AXIL_CTRL.wdata),
.s_wvalid (S_AXIL_CTRL.wvalid),
.s_wready (S_AXIL_CTRL.wready),
.s_bresp (S_AXIL_CTRL.bresp),
.s_bvalid (S_AXIL_CTRL.bvalid),
.s_bready (S_AXIL_CTRL.bready),
.OUT_data(cmd_data),
.OUT_valid(cmd_valid),
.IN_decoder_ready(IN_fetch_ready)
);
/*─────────────────────────────────────────────
[1-2] Communication OUT : NPU -> CPU (Using Read Channels)
───────────────────────────────────────────────*/
AXIL_STAT_OUT #(
.FIFO_DEPTH(8)
) u_stat_out (
.clk (clk),
.rst_n (rst_n),
.IN_clear(IN_clear), // FIXED: Typo i_clear -> IN_clear
.IN_data (IN_enc_stat), // FIXED: Typo i_enc_stat -> IN_enc_stat
.IN_valid(IN_enc_valid), // FIXED: Typo i_enc_valid -> IN_enc_valid
// AXI4-Lite Read channels directly routed from the interface
.s_araddr (S_AXIL_CTRL.araddr),
.s_arvalid(S_AXIL_CTRL.arvalid),
.s_arready(S_AXIL_CTRL.arready),
.s_rdata (S_AXIL_CTRL.rdata),
.s_rresp (S_AXIL_CTRL.rresp),
.s_rvalid (S_AXIL_CTRL.rvalid),
.s_rready (S_AXIL_CTRL.rready)
);
Interface Specification¶
npu_interfaces.svh defines two interfaces: axil_if and axis_if.
axil_if defaults to ADDR_W=12 and DATA_W=64, covering all five AXI-Lite channels
(AW, W, B, AR, R), with slave and master modports.
ctrl_npu_frontend binds the axil_if.slave modport.
axis_if provides an AXI-Stream interface carrying tdata, tvalid, tready, tlast,
and tkeep; it is used by data-path modules that transfer activation and weight streams.
interface axis_if #(
parameter DATA_WIDTH = 128
) ();
logic [ DATA_WIDTH-1:0] tdata;
logic tvalid;
logic tready;
logic tlast;
logic [(DATA_WIDTH/8)-1:0] tkeep;
// Slave Side (NPU Perspective: Input)
modport slave(input tdata, tvalid, tlast, tkeep, output tready);
// Master Side (NPU Perspective: Output)
modport master(output tdata, tvalid, tlast, tkeep, input tready);
endinterface
// axil_if.sv
interface axil_if #(
parameter int ADDR_W = 12,
parameter int DATA_W = 64
) (
input logic clk,
input logic rst_n
);
// AW Channel
logic [ADDR_W-1:0] awaddr;
logic [ 2:0] awprot;
logic awvalid, awready;
// W Channel
logic [ DATA_W-1:0] wdata;
logic [(DATA_W/8)-1:0] wstrb;
logic wvalid, wready;
// B Channel
logic [1:0] bresp;
logic bvalid, bready;
// AR Channel
logic [ADDR_W-1:0] araddr;
logic [ 2:0] arprot;
logic arvalid, arready;
// R Channel
logic [DATA_W-1:0] rdata;
logic [ 1:0] rresp;
logic rvalid, rready;
modport slave(
input awaddr, awprot, awvalid, wdata, wstrb, wvalid, bready,
input araddr, arprot, arvalid, rready,
output awready, wready, bresp, bvalid, arready, rdata, rresp, rvalid
);
modport master(
output awaddr, awprot, awvalid, wdata, wstrb, wvalid, bready,
output araddr, arprot, arvalid, rready,
input awready, wready, bresp, bvalid, arready, rdata, rresp, rvalid
);
endinterface
Last verified against
Commit 8c09e5e @ pccxai/pccx-FPGA-NPU-LLM-kv260 (2026-04-29).