NPU Frontend Modules

RTL source on GitHub

SystemVerilog sources documented on this page:

Role

The NPU frontend is the AXI-Lite entry point at the PS↔PL boundary on the NPU side, complementing the host driver HAL. The host driver writes ISA words to address 0x000 and a kick token to 0x008, then polls completion status over the same interface’s read channel. AXIL_CMD_IN receives host write transactions and enqueues command words in a FIFO. AXIL_STAT_OUT returns engine completion status to the host through a FIFO-backed read path. In the hierarchy, ctrl_npu_frontend sits above the AXI-Lite bus and below the controller’s decoder and dispatcher (NPU Controller Modules).

AXIL_CMD_IN

AXIL_CMD_IN accepts AXI-Lite write transactions from the host and stores command words in a synchronous FIFO. The parameter FIFO_DEPTH (default 8) sets the FIFO capacity in command-word entries. When the FIFO is full, s_awready deasserts, stalling the host at the AW channel. Two register addresses are decoded: 0x000 inserts a raw ISA word directly into the FIFO; 0x008 inserts a kick marker with bit63 = 1. The kick marker signals a batch boundary to the downstream dispatcher. A pop occurs each cycle that both OUT_valid and IN_decoder_ready are asserted.

Listing 18 hw/rtl/NPU_Controller/NPU_frontend/AXIL_CMD_IN.sv (AXI4-Lite Write Path)
  /*─────────────────────────────────────────────
  Register Address Map
  ───────────────────────────────────────────────*/
  localparam ADDR_INST = 12'h000;
  localparam ADDR_KICK = 12'h008;

AXIL_STAT_OUT

The upper module asserts IN_valid alongside IN_data to push a status word into the STAT_OUT FIFO after each engine completion. The parameter FIFO_DEPTH (default 8) applies. When the FIFO is full, pushes are dropped silently. Completion words carry idempotent information, so a dropped entry is recovered by the host’s next poll cycle. On an AR-channel handshake the FIFO head is latched into rdata_r and s_rvalid is asserted. The response is held until the host acknowledges with s_rready. s_arready deasserts when the FIFO is empty or when an R response is still in progress.

Listing 19 hw/rtl/NPU_Controller/NPU_frontend/AXIL_STAT_OUT.sv (AXI4-Lite Read Path)
  /*─────────────────────────────────────────────
  FIFO  (simple synchronous, FIFO_DEPTH entries)
  Push : IN_valid from upper module
  Pop  : AXI4-Lite read handshake with CPU
  ───────────────────────────────────────────────*/
  localparam PTR_W = $clog2(FIFO_DEPTH);

  logic [`ISA_WIDTH-1:0] mem[0:FIFO_DEPTH-1];
  logic [PTR_W:0] wr_ptr, rd_ptr;
  logic fifo_empty, fifo_full;

  assign fifo_empty = (wr_ptr == rd_ptr);
  assign fifo_full  = (wr_ptr[PTR_W] != rd_ptr[PTR_W]) && (wr_ptr[PTR_W-1:0] == rd_ptr[PTR_W-1:0]);

  logic fifo_ren;

  always_ff @(posedge clk) begin
    if (!rst_n || IN_clear) begin
      wr_ptr <= '0;
      rd_ptr <= '0;
    end else begin
      // push : upper module feeds status continuously
      if (IN_valid && !fifo_full) begin
        mem[wr_ptr[PTR_W-1:0]] <= IN_data;
        wr_ptr <= wr_ptr + 1'b1;
      end
      // pop : CPU consumed the data
      if (fifo_ren && !fifo_empty) rd_ptr <= rd_ptr + 1'b1;
    end
  end

  /*─────────────────────────────────────────────
  AXI4-Lite Read Path
  Wait for AR, then pop one entry from FIFO and return it.
  Hold rvalid until CPU acknowledges with rready.
  ───────────────────────────────────────────────*/
  logic [`ISA_WIDTH-1:0] rdata_r;
  logic                  rvalid_r;

  assign s_rdata   = rdata_r;
  assign s_rresp   = 2'b00;
  assign s_rvalid  = rvalid_r;
  assign s_arready = ~rvalid_r && ~fifo_empty;  // ready only when FIFO has data
  assign fifo_ren  = s_arvalid && s_arready;  // pop on AR handshake

  always_ff @(posedge clk) begin
    if (!rst_n || IN_clear) begin
      rdata_r  <= '0;
      rvalid_r <= 1'b0;
    end else begin
      // AR handshake → latch FIFO head and assert rvalid
      if (s_arvalid && s_arready) begin
        rdata_r  <= mem[rd_ptr[PTR_W-1:0]];
        rvalid_r <= 1'b1;
      end
      // R handshake → CPU consumed data, release
      if (rvalid_r && s_rready) rvalid_r <= 1'b0;
    end
  end

ctrl_npu_frontend

ctrl_npu_frontend is a wrapper that routes the axil_if.slave signals to AXIL_CMD_IN (write channels) and AXIL_STAT_OUT (read channels). The CMD_IN outputs cmd_data and cmd_valid are forwarded as OUT_RAW_instruction and OUT_kick to the downstream controller. OUT_kick is combinational: cmd_valid & IN_fetch_ready. The status path accepts IN_enc_stat and IN_enc_valid directly from the encoder FSM. ctrl_npu_interface.sv is currently a placeholder for future per-core interface aggregation.

Listing 20 hw/rtl/NPU_Controller/NPU_frontend/ctrl_npu_frontend.sv (instance wiring)
  AXIL_CMD_IN #(
      .FIFO_DEPTH(8)
  ) u_cmd_in (
      .clk     (clk),
      .rst_n   (rst_n),
      .IN_clear(IN_clear), // FIXED: Typo i_clear -> IN_clear

      // AXI4-Lite Write channels directly routed from the interface
      .s_awaddr (S_AXIL_CTRL.awaddr),
      .s_awvalid(S_AXIL_CTRL.awvalid),
      .s_awready(S_AXIL_CTRL.awready),
      .s_wdata  (S_AXIL_CTRL.wdata),
      .s_wvalid (S_AXIL_CTRL.wvalid),
      .s_wready (S_AXIL_CTRL.wready),
      .s_bresp  (S_AXIL_CTRL.bresp),
      .s_bvalid (S_AXIL_CTRL.bvalid),
      .s_bready (S_AXIL_CTRL.bready),

      .OUT_data(cmd_data),
      .OUT_valid(cmd_valid),
      .IN_decoder_ready(IN_fetch_ready)
  );

  /*─────────────────────────────────────────────
  [1-2] Communication OUT : NPU -> CPU (Using Read Channels)
  ───────────────────────────────────────────────*/
  AXIL_STAT_OUT #(
      .FIFO_DEPTH(8)
  ) u_stat_out (
      .clk     (clk),
      .rst_n   (rst_n),
      .IN_clear(IN_clear), // FIXED: Typo i_clear -> IN_clear

      .IN_data (IN_enc_stat),  // FIXED: Typo i_enc_stat -> IN_enc_stat
      .IN_valid(IN_enc_valid), // FIXED: Typo i_enc_valid -> IN_enc_valid

      // AXI4-Lite Read channels directly routed from the interface
      .s_araddr (S_AXIL_CTRL.araddr),
      .s_arvalid(S_AXIL_CTRL.arvalid),
      .s_arready(S_AXIL_CTRL.arready),
      .s_rdata  (S_AXIL_CTRL.rdata),
      .s_rresp  (S_AXIL_CTRL.rresp),
      .s_rvalid (S_AXIL_CTRL.rvalid),
      .s_rready (S_AXIL_CTRL.rready)
  );

Interface Specification

npu_interfaces.svh defines two interfaces: axil_if and axis_if. axil_if defaults to ADDR_W=12 and DATA_W=64, covering all five AXI-Lite channels (AW, W, B, AR, R), with slave and master modports. ctrl_npu_frontend binds the axil_if.slave modport. axis_if provides an AXI-Stream interface carrying tdata, tvalid, tready, tlast, and tkeep; it is used by data-path modules that transfer activation and weight streams.

Listing 21 hw/rtl/NPU_Controller/npu_interfaces.svh (axis_if)
interface axis_if #(
    parameter DATA_WIDTH = 128
) ();
  logic [    DATA_WIDTH-1:0] tdata;
  logic                      tvalid;
  logic                      tready;
  logic                      tlast;
  logic [(DATA_WIDTH/8)-1:0] tkeep;

  // Slave Side (NPU Perspective: Input)
  modport slave(input tdata, tvalid, tlast, tkeep, output tready);

  // Master Side (NPU Perspective: Output)
  modport master(output tdata, tvalid, tlast, tkeep, input tready);
endinterface
Listing 22 hw/rtl/NPU_Controller/npu_interfaces.svh (axil_if)
// axil_if.sv
interface axil_if #(
    parameter int ADDR_W = 12,
    parameter int DATA_W = 64
) (
    input logic clk,
    input logic rst_n
);
  // AW Channel
  logic [ADDR_W-1:0] awaddr;
  logic [       2:0] awprot;
  logic awvalid, awready;

  // W Channel
  logic [    DATA_W-1:0] wdata;
  logic [(DATA_W/8)-1:0] wstrb;
  logic wvalid, wready;

  // B Channel
  logic [1:0] bresp;
  logic bvalid, bready;

  // AR Channel
  logic [ADDR_W-1:0] araddr;
  logic [       2:0] arprot;
  logic arvalid, arready;

  // R Channel
  logic [DATA_W-1:0] rdata;
  logic [       1:0] rresp;
  logic rvalid, rready;

  modport slave(
      input awaddr, awprot, awvalid, wdata, wstrb, wvalid, bready,
      input araddr, arprot, arvalid, rready,
      output awready, wready, bresp, bvalid, arready, rdata, rresp, rvalid
  );

  modport master(
      output awaddr, awprot, awvalid, wdata, wstrb, wvalid, bready,
      output araddr, arprot, arvalid, rready,
      input awready, wready, bresp, bvalid, arready, rdata, rresp, rvalid
  );
endinterface

Last verified against

Commit 8c09e5e @ pccxai/pccx-FPGA-NPU-LLM-kv260 (2026-04-29).