HAL — AXI-Lite MMIO Layer¶
The uca_hal_* function set is the AXI-Lite MMIO layer that sits
between the public C API (uca_*, see :doc:api) and the NPU hardware.
No code above this layer accesses physical addresses or register offsets
directly.
The implementation lives in
codes/v002/sw/driver/uCA_v1_hal.c / uCA_v1_hal.h.
HAL Position¶
The driver stack is organized into two layers.
Layer |
Symbol prefix |
Role |
|---|---|---|
Public API |
|
Compute and memory primitives. Assembles 64-bit VLIW instructions
and passes them through the HAL. See :doc: |
HAL |
|
AXI-Lite register reads and writes, 64-bit instruction latching, status polling. Depends directly on KV260 bare-metal pointer MMIO. |
The HAL stores all state in a single file-scope singleton,
g_mmio_base (volatile uint32_t *). No context pointer is used;
a single process is expected to communicate with one NPU instance.
static volatile uint32_t *g_mmio_base = NULL;
Register Map¶
The MMIO base address is UCA_MMIO_BASE_ADDR = 0xA0000000. This value
must match the AXI-Lite slave address assigned in the Vivado block
design.
// ===| MMIO Base Address |=======================================================
// Must match the AXI-Lite slave address assigned in the Vivado block design.
#define UCA_MMIO_BASE_ADDR 0xA0000000UL
// ===| Register Offsets |========================================================
// All offsets are byte offsets from UCA_MMIO_BASE_ADDR.
// The 64-bit instruction register is split into two 32-bit words.
// Write LO first; writing HI triggers the NPU instruction latch.
#define UCA_REG_INSTR_LO 0x00 // [31:0] lower 32 bits of 64-bit VLIW instruction
#define UCA_REG_INSTR_HI 0x04 // [63:32] upper 32 bits; writing this latches the instruction
#define UCA_REG_STATUS 0x08 // [31:0] NPU status (read-only)
Name |
Offset |
Access |
Description |
|---|---|---|---|
|
|
Write |
Lower 32 bits of the 64-bit VLIW instruction. Written first. |
|
|
Write |
Upper 32 bits of the 64-bit VLIW instruction. Writing this register triggers the NPU instruction latch. |
|
|
Read |
NPU status register (read-only). Contains |
A 64-bit instruction is written as a pair, LO first, HI second. The HI write triggers the controller’s instruction latch.
void uca_hal_issue_instr(uint64_t instr) {
// Write lower word first.
// Writing the upper word triggers the NPU instruction latch (ISA §8).
uca_hal_write32(UCA_REG_INSTR_LO, (uint32_t)(instr & 0xFFFFFFFFULL));
uca_hal_write32(UCA_REG_INSTR_HI, (uint32_t)(instr >> 32));
}
CMD_IN / STAT_OUT Mechanics¶
uca_hal_issue_instr submits a 64-bit instruction to the NPU’s CMD_IN
path by writing the register pair. The call returns immediately; the
NPU controller executes the instruction independently inside its
pipeline.
Status register UCA_REG_STATUS bit fields:
// ===| Status Register Bit Fields |==============================================
#define UCA_STAT_BUSY (1U << 0) // NPU is executing — do not issue new instruction
#define UCA_STAT_DONE (1U << 1) // Last operation completed successfully
UCA_STAT_BUSY(bit 0) — NPU is executing an instruction. Do not issue a new instruction while this bit is set.UCA_STAT_DONE(bit 1) — Last operation completed successfully.
Polling is performed by uca_hal_wait_idle. Because no hardware timer
driver is yet available on the bare-metal KV260, the current
implementation uses a busy-wait loop with an iteration count estimated
at the 400 MHz core rate.
int uca_hal_wait_idle(uint32_t timeout_us) {
// Bare-metal busy-wait.
// TODO: replace with a hardware timer once a timer driver is available.
uint32_t count = timeout_us * 400; // ~1 iteration per ns at 400 MHz estimate
while (count--) {
if (!(uca_hal_read_status() & UCA_STAT_BUSY)) {
return 0; // Idle
}
}
return -1; // Timeout
}
When timeout_us decrements to zero, -1 is returned. The NPU state is
not forced-reset on timeout; the caller is responsible for error
recovery.
uca_init Flow¶
uca_hal_init performs three operations in sequence.
Sets
g_mmio_baseto(volatile uint32_t *)UCA_MMIO_BASE_ADDR. Physical addresses are directly accessible in the KV260 bare-metal environment.Calls
uca_hal_read32(UCA_REG_STATUS)to read the status register.If the return value is
0xFFFFFFFF, the AXI bus is not responding; returns-1. Otherwise returns0.
int uca_hal_init(void) {
// On bare-metal KV260, physical addresses are directly accessible.
g_mmio_base = (volatile uint32_t *)UCA_MMIO_BASE_ADDR;
// Sanity check: status register reads all-ones on an unconnected AXI bus.
uint32_t stat = uca_hal_read32(UCA_REG_STATUS);
if (stat == 0xFFFFFFFFU) {
return -1; // Hardware not responding
}
return 0;
}
uca_hal_deinit sets g_mmio_base to NULL. Any subsequent
uca_hal_write32 or uca_hal_read32 call will dereference a null
pointer; the caller must ensure no HAL calls follow uca_hal_deinit.
See also
Public API primitives: :doc:
apiAXI-Lite command path architecture: :doc:
../Architecture/top_levelISA instruction encoding: :doc:
../ISA/encoding
Last verified against
Commit 8c09e5e @ pccxai/pccx-FPGA-NPU-LLM-kv260 (2026-04-29)