# How to Design Fast Asynchronous Routers for Asynchronous On-chip Networks

#### Wei Song Supervisor: Doug Edwards Advanced Processor Technologies Group

Advanced Processor Technology Group The School of Computer Science

- What is asynchronous circuit?
- Why to use *on-chip network*?
- Why asynchronous on-chip network is slow?
- How can we improve it?
- So, what's next?

MANCHESTER



- · Pipeline style
- $\cdot$  Strict timing assumption
- A global clock driven by a balanced tree

# Asynchronous Circuits – C-element



| Α | В | Q' |
|---|---|----|
| 0 | 0 | 0  |
| 0 | Х | Q  |
| Х | 0 | Q  |
| ] | 1 | 1  |

Advanced Processor Technology Group The School of Computer Science



- $\cdot$  Handshake
- Nearly delay insensitive (no timing assumptions)
- Power efficient (no global clock)
- · Complicated (larger area)

- What is *asynchronous circuit*?
- Why to use <u>on-chip network?</u>
- Why asynchronous on-chip network is slow?
- How can we improve it?
- So, what's next?

MANCHESTER



# Bus Based Multiprocessor System



- A shared communication fabric
- $\cdot$  One master at one time
- Bandwidth constrained
- Fixed communication latency

# A Mesh Network-on-Chip (NoC)



- Distributed communication resource
- Scalable bandwidth
- Multiple master and slave pairs at a time
- Variable communication latency

# The Router for NoC



MANCHESTER

- · 5 ports
- Duplex channels
- Input buffer
- Arbiter
- · Crossbar (Muxes)



#### Data Path of a NoC





Advanced Processor Technology Group The School of Computer Science

- What is *asynchronous circuit*?
- Why to use *on-chip network*?
- Why asynchronous on-chip network is slow?
- How can we improve it?
- So, what's next?

MANCHESTER



# A 4-bit Synchronous Pipeline



- Data are synchronised by the global clock
- No significant speed difference with the 1bit pipeline



# A 4-bit Asynchronous Pipeline



Advanced Processor Technology Group The School of Computer Science



# Reasons of the Low Speed

- Asynchronous pipelines deliberately detect the arrival of data
- A big C-element tree in the loop!

- What is *asynchronous circuit*?
- Why to use *on-chip network*?
- Why asynchronous on-chip network is slow?
- How can we improve it?
- So, what's next?

MANCHESTER



## Channel Slicing









## Re-Synchronisation (2)





## Re-Synchronisation (3)



Advanced Processor Technology Group The School of Computer Science

# Hardware Implementation



- Verilog
  HDL+STG(Petrify)
- Layout Implementation
- Faraday 130 nm Technology
- 12.6K Gates
  (50,000um<sup>2</sup>)
- 0.3\*0.3mm<sup>2</sup>
- Channel Sliced 450MHz
- Synchronised 360MHz



#### Performance



- What is *asynchronous circuit*?
- Why to use *on-chip network*?
- Why asynchronous on-chip network is slow?
- How can we improve it?
- So, what's next?

MANCHESTEF

# Spatial Division Multiplex

- Frequently Re-synchronisation will compromise the speed
- Sub-channels should run independently
- Sub-channels could transmit different messages
- Multiple messages could be transmitted by the same channel but on different sub-channels

MANCHEST



# Spatial Division Multiplex (con.)



Advanced Processor Technology Group The School of Computer Science

### Conclusion

- Asynchronous Circuits
  - Delay insensitive, low power
- On-chip Network
  - Distributed communication fabric, scalable bandwidth
- Asynchronous On-chip Network
  - The C-element tree in synchronisation compromises speed
- · Channel Slicing
  - Let sub-channels run independently, fast
- · SDM

MANCHESTEI

The University of Manchesté

- Let more messages share the fabric simultaneously



#### Thanks!

Advanced Processor Technology Group The School of Computer Science