SystemVerilog AXI4 AMBA RTL FPGA ASIC Crossbar

AMBA AXI4 Interconnect

Fully synthesizable, parameterized 2×2 non-blocking crossbar in SystemVerilog connecting two AXI4 masters to dual synchronous SRAM slaves, targeting 250 MHz on Xilinx UltraScale+.

April 1, 2025

Overview

This project implements a production-grade AXI4 interconnect in SystemVerilog from first principles. The design is a 2×2 non-blocking crossbar — two AXI4 masters connecting to two synchronous SRAM slaves — built to demonstrate professional RTL engineering: protocol correctness, burst handling, multiple outstanding transactions, and system-level arbitration.

The implementation is fully synthesizable and parameterized, with a verified 250 MHz timing closure on Xilinx UltraScale+ FPGAs.

Repository: akshay-b-prasad/amba-axi4-interconnect


Architecture

The crossbar connects two masters to two slaves, each backed by 64 KB of synchronous SRAM. Masters can simultaneously transact with different slaves without any blocking — the non-blocking property is fundamental to the design.

SlaveBase AddressCapacity
S00x0000_000064 KB
S10x0001_000064 KB

Key Design Decisions

Round-robin arbitration. When both masters contend for the same slave, a round-robin arbiter resolves priority without starvation. The arbiter state advances after each granted transaction, not each beat, preserving burst atomicity.

ID extension. Each master’s AXI IDs are prepended with a 1-bit master index before being forwarded to slaves. Slaves track and return this extended ID; the crossbar strips the prefix before returning responses — following the ARM CoreLink pattern. This allows the crossbar to route read/write responses back to the correct master unambiguously.

Per-master W-route FIFOs. AXI4 decouples the AW (write address) and W (write data) channels, so write data can arrive before the slave is selected. A FIFO per master records which slave claimed each AW transaction, ensuring write beats are delivered to the correct destination even under back-pressure.

Read serialization. Current implementation processes one read burst per slave sequentially for protocol clarity and verification simplicity. The read data path is a documented extension point for pipelining.


Protocol Compliance

The interconnect handles the full AXI4 feature set required by AMBA IHI0022H:


RTL Quality

The design is written to modern SystemVerilog coding standards:


Verification

An 8-scenario self-checking testbench covers the full feature surface:

#Scenario
1Single-beat read and write
2Multi-beat INCR bursts
3Address WRAP burst with boundary crossing
4Byte-strobe selective write
5Back-pressure on READY de-assertion
6Cross-slave concurrency (M0→S0, M1→S1 simultaneously)
7Arbiter contention (both masters targeting same slave)
8Maximum-length bursts (256 beats)

The testbench supports four simulators via a single Makefile:

make SIM=icarus    # Icarus Verilog (default)
make SIM=modelsim  # ModelSim / QuestaSim
make SIM=xrun      # Cadence Xcelium
make SIM=vivado    # Vivado Simulator

FPGA Results

Targeting Xilinx UltraScale+ at 250 MHz with default parameters (32-bit data, 2×64 KB SRAM):

ResourceUtilization
LUTs~800
Flip-Flops~600
BRAM 18K4 blocks

SRAM infers as Block RAM automatically from the synchronous read/write pattern. Timing constraints are included in constraints/ for out-of-the-box Vivado implementation.


File Structure

rtl/           - axi4_pkg.sv, interfaces, SRAM model, slave, master, crossbar, top
tb/            - axi4_tb.sv (self-checking, 8 scenarios)
sim/           - Makefile with multi-simulator support
constraints/   - Xilinx XDC for 250 MHz timing closure
All projects