L2 Cache (URAM)¶
Three modules form the v002 1.75 MiB UltraRAM L2 cache:
mem_GLOBAL_cache (top wrapper), mem_L2_cache_fmap (URAM array), and
mem_BUFFER (ACP CDC bridge). mem_GLOBAL_cache drives the state machines
for both the ACP port and the NPU port, and instantiates the other two
internally.
Role¶
The L2 cache is a True Dual-Port URAM block resident in the NPU clock domain
(400 MHz). Port A performs DDR4 ↔ L2 DMA over the PS ACP path. Port B is
accessed directly by the compute engines (GEMM, GEMV, CVO). Both fmap and
weight data occupy the same address space; the address ranges assigned to each
are determined by software convention and the shape constant RAM managed by
mem_dispatcher. The hardware makes no distinction between the two regions.
Storage Layout¶
mem_L2_cache_fmap is instantiated with Depth = 114_688 and a data width
of 128 bits.
The address unit is a 128-bit word. Word address \(N\) corresponds to byte offset \([16N,\ 16N+15]\). The 17-bit address bus can address up to 131,072 words; the current capacity of 114,688 words falls within that range.
URAM read latency is 3 clock cycles. The Port A read path tracks data validity
with a 3-bit shift register acp_rd_valid_pipe; OUT_acp_rdata is valid 3 cycles
after the address is presented, and core_acp_tx_bus.tvalid is asserted
simultaneously. Port B reads carry the same 3-cycle latency characteristic.
Port Interface¶
The Port A (ACP) and Port B (NPU) control signals are:
// ===| Port A — ACP DMA control |============================================
input logic IN_acp_write_en, // 1=write (DDR→L2), 0=read (L2→DDR)
input logic [16:0] IN_acp_base_addr,
input logic IN_acp_rx_start, // start ACP transfer
input logic [16:0] IN_acp_end_addr,
output logic OUT_acp_is_busy,
// ===| Port B — NPU compute direct access |==================================
input logic IN_npu_write_en,
input logic [16:0] IN_npu_base_addr,
input logic IN_npu_rx_start,
input logic [16:0] IN_npu_end_addr,
output logic OUT_npu_is_busy,
input logic [127:0] IN_npu_wdata,
output logic [127:0] OUT_npu_rdata
Port contract:
Asserting
IN_acp_rx_startorIN_npu_rx_startstarts a sequential transfer frombase_addrthroughend_addr - 1on the respective port.write_en = 1transfers data into L2 (external source to L2);write_en = 0transfers data out of L2 (L2 to external sink).While
OUT_acp_is_busyorOUT_npu_is_busyis asserted, the port does not accept a new command. Command queuing is handled bymem_u_operation_queue.
The two ports have independent state machines and can operate concurrently.
Port B arbitration is performed by mem_dispatcher; when mem_CVO_stream_bridge
is active it preempts Port B. NPU DMA commands hold in the operation queue
during preemption and resume after the bridge completes.
Submodules¶
Module |
Role |
|---|---|
|
114,688 × 128-bit URAM array. Provides Port A and Port B read/write interfaces. |
|
ACP CDC bridge: AXI 250 MHz ↔ Core 400 MHz. Two |
|
Top wrapper containing both modules. Drives Port A and Port B state machines. |
Last verified against
Commit 8c09e5e @ pccxai/pccx-FPGA-NPU-LLM-kv260 (2026-04-29)