AMBA AXI4 Interconnect | Akshay Prasad

Overview

I built a 2×2 non-blocking crossbar in SystemVerilog from first principles: two AXI4 masters, two synchronous SRAM slaves. The goal was to cover what actually shows up in production silicon: protocol correctness, burst handling, multiple outstanding transactions, and system-level arbitration.

The implementation is fully synthesizable and parameterized, with a verified 250 MHz timing closure on Xilinx UltraScale+ FPGAs.

Repository: akshay-b-prasad/amba-axi4-interconnect

Architecture

The crossbar connects two masters to two slaves, each backed by 64 KB of synchronous SRAM. Masters can simultaneously transact with different slaves with no blocking between them. That non-blocking property is fundamental to the design.

Slave	Base Address	Capacity
S0	`0x0000_0000`	64 KB
S1	`0x0001_0000`	64 KB

Key Design Decisions

Round-robin arbitration. When both masters contend for the same slave, a round-robin arbiter resolves priority without starvation. The arbiter state advances after each granted transaction, not each beat, preserving burst atomicity.

ID extension. Each master’s AXI IDs are prepended with a 1-bit master index before being forwarded to slaves. Slaves track and return this extended ID. The crossbar strips the prefix before returning responses, following the ARM CoreLink pattern. That lets it route responses back to the correct master without ambiguity.

Per-master W-route FIFOs. AXI4 decouples the AW (write address) and W (write data) channels, so write data can arrive before the slave is selected. A FIFO per master records which slave claimed each AW transaction, ensuring write beats are delivered to the correct destination even under back-pressure.

Read serialization. Current implementation processes one read burst per slave sequentially for protocol clarity and verification simplicity. The read data path is a documented extension point for pipelining.

Protocol Compliance

The interconnect handles the full AXI4 feature set required by AMBA IHI0022H:

Burst lengths up to 256 beats (AWLEN/ARLEN = 8-bit)
Burst types: FIXED, INCR, and WRAP (all address calculation modes)
Byte strobes (WSTRB) for byte-granular write masking across all paths
Outstanding transactions: up to 4 concurrent in-flight writes and 4 reads per slave
Correct VALID/READY handshaking on all five AXI channels (AW, W, B, AR, R)

RTL Quality

The design is written to modern SystemVerilog coding standards:

always_ff for all sequential logic; always_comb for combinational
$clog2-parameterized widths, no magic numbers or architecture-specific constants
Module-level parameters for data width, address width, ID width, and burst depth
No latches; all outputs registered or explicitly driven combinationally

Verification

An 8-scenario self-checking testbench covers the full feature surface:

#	Scenario
1	Single-beat read and write
2	Multi-beat INCR bursts
3	Address WRAP burst with boundary crossing
4	Byte-strobe selective write
5	Back-pressure on READY de-assertion
6	Cross-slave concurrency (M0→S0, M1→S1 simultaneously)
7	Arbiter contention (both masters targeting same slave)
8	Maximum-length bursts (256 beats)

The testbench supports four simulators via a single Makefile:

make SIM=icarus    # Icarus Verilog (default)
make SIM=modelsim  # ModelSim / QuestaSim
make SIM=xrun      # Cadence Xcelium
make SIM=vivado    # Vivado Simulator

FPGA Results

Targeting Xilinx UltraScale+ at 250 MHz with default parameters (32-bit data, 2×64 KB SRAM):

Resource	Utilization
LUTs	~800
Flip-Flops	~600
BRAM 18K	4 blocks

SRAM infers as Block RAM automatically from the synchronous read/write pattern. Timing constraints are included in constraints/ for out-of-the-box Vivado implementation.

File Structure

rtl/           - axi4_pkg.sv, interfaces, SRAM model, slave, master, crossbar, top
tb/            - axi4_tb.sv (self-checking, 8 scenarios)
sim/           - Makefile with multi-simulator support
constraints/   - Xilinx XDC for 250 MHz timing closure