

# ASYNCHRONOUS ON-CHIP NETWORKS AND FAULT-TOLERANT TECHNIQUES

Wei Song Guangda Zhang



# Asynchronous On-Chip Networks and Fault-Tolerant Techniques



# Asynchronous On-Chip Networks and Fault-Tolerant Techniques

Wei Song and Guangda Zhang



CRC Press is an imprint of the Taylor & Francis Group, an **informa** business First edition published 2022 by CRC Press 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742

and by CRC Press 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN

CRC Press is an imprint of Taylor & Francis Group, LLC

© 2022 Wei Song and Guangda Zhang

Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, access www. copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact mpkbookspermissions@tandf.co.uk

*Trademark notice*: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe.

ISBN: 978-1-032-25575-0 (hbk) ISBN: 978-1-032-25741-9 (pbk) ISBN: 978-1-003-28478-9 (ebk)

DOI: 10.1201/9781003284789

Typeset in Latin Modern by KnowledgeWorks Global Ltd. To Chunzhen and Chao for their patience and love. To Mingyi and Jersey for bringing joy into our life.



# Contents

| Preface |               |                               | xv |
|---------|---------------|-------------------------------|----|
| Chapter | 1 = Ir        | ntroduction                   | 1  |
|         |               |                               |    |
| 1.1     | ASYN          | CHRONOUS CIRCUITS             | 2  |
| 1.2     | ASYN          | CHRONOUS ON-CHIP NETWORKS     | 3  |
| 1.3     | FAULT<br>NETW | TOLERANT ASYNCHRONOUS ON-CHIP | 6  |
|         | 1.3.1         | Protection for QDI Links      | 10 |
|         | 1.3.2         | Deadlock Detection            | 10 |
|         | 1.3.3         | Network Recovery              | 11 |
| Chapter | 2 • A         | synchronous Circuits          | 13 |
|         |               |                               |    |
| 2.1     | CIRCL         | JIT CLASSIFICATION            | 13 |
|         | 2.1.1         | Delay-Insensitive             | 14 |
|         | 2.1.2         | Quasi-Delay-Insensitive       | 14 |
|         | 2.1.3         | Speed-Independent             | 15 |
|         | 2.1.4         | Relaxed QDI                   | 16 |
|         | 2.1.5         | Self-Timed                    | 16 |
| 2.2     | HAND          | SHAKE PROTOCOLS               | 17 |
|         | 2.2.1         | Return-to-Zero                | 17 |
|         | 2.2.2         | Non-Return-to-Zero            | 18 |

| 2.3     | DATA ENCODING |               |                            | 19 |
|---------|---------------|---------------|----------------------------|----|
|         | 2.3.1         | Non-Delay-In  | sensitive Codes            | 19 |
|         | 2.3.2         | Delay-Insensi | tive Codes                 | 20 |
|         |               | 2.3.2.1 1-of  | -n Encoding                | 20 |
|         |               | 2.3.2.2 m-c   | f-n Encoding               | 21 |
|         |               | 2.3.2.3 Oth   | er DI Encoding             | 23 |
|         | 2.3.3         | Code Evaluat  | tion                       | 23 |
| 2.4     | ASYN          | HRONOUS PI    | PELINES                    | 24 |
|         | 2.4.1         | Bundled-Dat   | a Pipeline                 | 25 |
|         | 2.4.2         | Multi-Rail Pi | peline                     | 26 |
|         | 2.4.3         | Performance   | Comparison                 | 29 |
|         |               | 2.4.3.1 Pip   | eline Delay                | 29 |
|         |               | 2.4.3.2 Pip   | eline Throughput           | 29 |
|         |               | 2.4.3.3 Are   | a and Power Consumption    | 30 |
| 2.5     |               |               | F ASYNCHRONOUS             |    |
|         | CIRCL         | ITS           |                            | 31 |
|         | 2.5.1         | Functional A  | nalysis                    | 31 |
|         | 2.5.2         | Common Cir    | cuit Components            | 34 |
|         |               | 2.5.2.1 Bas   | ic Components              | 34 |
|         |               | 2.5.2.2 Arb   | oiters                     | 39 |
|         |               | 2.5.2.3 Alle  | ocators                    | 44 |
|         | 2.5.3         | Metastability | and Synchronization        | 47 |
|         | 2.5.4         | Optimization  | with Traditional EDA Tools | 49 |
|         |               | 2.5.4.1 Loc   | p Elimination              | 50 |
|         |               | 2.5.4.2 Spe   | ed Optimization            | 53 |
| Chapter | 3 <b>-</b> A  | synchronous I | Networks-on-Chip           | 57 |
|         |               |               |                            |    |
| 3.1     | CONC          | EPTS OF NET   | WORKS-ON-CHIP              | 58 |
|         | 3.1.1         | Network Lay   | er Model                   | 59 |
|         | 3.1.2         | Network Top   | ology                      | 60 |

|         | 3.1.3       | Switching Techniques |                                           | 63 |
|---------|-------------|----------------------|-------------------------------------------|----|
|         |             | 3.1.3.1              | Circuit Switching and Packet<br>Switching | 64 |
|         |             | 3.1.3.2              | Virtual Channel                           | 67 |
|         |             | 3.1.3.3              | Other Flow Control Methods                | 70 |
|         |             | 3.1.3.4              | Quality of Service                        | 71 |
|         | 3.1.4       | Routing              | Algorithms                                | 72 |
|         |             | 3.1.4.1              | Deterministic and                         |    |
|         |             |                      | Non-Deterministic                         | 73 |
|         |             | 3.1.4.2              | Deadlock and Livelock                     | 75 |
| 3.2     | ASYN        | CHRONOL              | JS NETWORKS-ON-CHIP                       | 78 |
|         | 3.2.1       | Taxonon<br>Network   | ny of Asynchronous On-Chip<br>s           | 78 |
|         | 3.2.2       | Previous             | Asynchronous NoCs                         | 82 |
|         |             | 3.2.2.1              | SpiNNaker                                 | 82 |
|         |             | 3.2.2.2              | ASPIN                                     | 83 |
|         |             | 3.2.2.3              | QoS NoC                                   | 85 |
|         |             | 3.2.2.4              | ANOC                                      | 85 |
|         |             | 3.2.2.5              | MANGO                                     | 86 |
|         |             | 3.2.2.6              | QNoC                                      | 87 |
| CHAPTER | <b>4∎</b> 0 | ptimizing            | Asynchronous On-Chip Networks             | 89 |
| 4.1     | CHANI       | NEL SLICI            | NG                                        | 89 |
|         | 4.1.1       | Synchro              | nization Overhead                         | 90 |
|         | 4.1.2       | Channel              |                                           | 92 |
|         | 4.1.3       |                      | ad Pipeline                               | 95 |
|         | 4.1.4       |                      | Sliced Wormhole Router                    | 97 |
|         |             |                      | Router Structure                          | 98 |

4.1.4.2 Performance Evaluation 102

| 4.2 | SPATI/ | AL DIVISIO | ON MULTIPLEXING                          | 104 |
|-----|--------|------------|------------------------------------------|-----|
|     | 4.2.1  |            | ns of the Virtual Channel Flow           |     |
|     |        | Control    |                                          | 105 |
|     |        | 4.2.1.1    | Slow Switch Allocation                   | 105 |
|     |        | 4.2.1.2    | Large Area Overhead                      | 106 |
|     |        | 4.2.1.3    | Long Pipeline Synchronization<br>Latency | 107 |
|     | 4.2.2  | Spatial    | Division Multiplexing                    | 107 |
|     | 4.2.3  | SDM Re     | outer                                    | 110 |
|     |        | 4.2.3.1    | Router Structure                         | 110 |
|     |        | 4.2.3.2    | Performance Evaluation                   | 113 |
|     | 4.2.4  | Compar     | ison between SDM and VC                  | 114 |
|     |        | 4.2.4.1    | Area Model                               | 114 |
|     |        | 4.2.4.2    | Latency Model                            | 117 |
|     |        | 4.2.4.3    | Model for VC Routers                     | 121 |
|     |        | 4.2.4.4    | Performance Analysis                     | 122 |
| 4.3 | AREA   | REDUCT     | ON USING CLOS NETWORKS                   | 129 |
|     | 4.3.1  | Clos Sw    | itching Networks                         | 130 |
|     | 4.3.2  | Dispatcl   | hing Algorithm                           | 134 |
|     |        | 4.3.2.1    | Concurrent Round-Robin<br>Dispatching    | 134 |
|     |        | 4.3.2.2    | · · · · · · · · · · · · · · · · · · ·    | 136 |
|     |        | 4.3.2.3    | <i>i</i> 1 0                             | 140 |
|     | 4.3.3  |            | onous Clos Scheduler                     | 144 |
|     | 1.0.0  | 4.3.3.1    | Implementation                           | 144 |
|     |        | 4.3.3.2    | Performance                              | 152 |
|     | 4.3.4  |            | outer Using 2-Stage Clos Switch          | 155 |
|     |        | 4.3.4.1    | Asynchronous 2-Stage Clos                |     |
|     |        |            | Switch                                   | 155 |
|     |        | 4.3.4.2    | Router Implementation                    | 157 |
|     |        | 4.3.4.3    | Performance Evaluation                   | 163 |

| Chapter | 5 <b>•</b> F | ault-Toler         | ant Asynchronous Circuits               | 167        |
|---------|--------------|--------------------|-----------------------------------------|------------|
|         |              |                    |                                         |            |
| 5.1     | FAULT        | CLASSIF            |                                         | 168        |
|         | 5.1.1        | Transie            | nt Faults                               | 169        |
|         | 5.1.2        | Perman             | ent Faults                              | 171        |
|         | 5.1.3        | Intermit           | tent Faults                             | 173        |
| 5.2     | FAULT        | -TOLERAI           | NT TECHNIQUES                           | 173        |
|         | 5.2.1        | Masking            | g Factors                               | 173        |
|         | 5.2.2        | Redund             | ancy Techniques                         | 174        |
| 5.3     |              |                    | ANSIENT FAULTS ON QDI                   |            |
|         | PIPEL        |                    |                                         | 176        |
|         | 5.3.1        | Faults c           | n Synchronous and QDI Pipelines         | 177        |
|         | 5.3.2        | Impact             | Modeling of Transient Faults            | 179        |
|         |              | 5.3.2.1            | Faults on Data with Positive $Ack$      | 180        |
|         |              | 5.3.2.2            | Faults on Data with Negative Ack        | 182        |
|         |              | 5.3.2.3            | Faults on Completion Detector and $Ack$ | 183        |
|         |              | 5.3.2.4            | Physical-Layer Deadlock                 | 183        |
| 5.4     | DEAD         | LOCK MO            | DELING                                  | 184        |
|         | 5.4.1        | Deadloc            | k Caused by Permanent Faults            | 186        |
|         | 5.4.2        | Deadloc            | k Caused by Transient Faults            | 190        |
|         | 5.4.3        | Deadloc            | k Analysis                              | 200        |
| 5.5     | RELATED WORK |                    |                                         |            |
|         | 5.5.1        | Tolerati           | ng Transient Faults                     | 204        |
|         |              | 5.5.1.1            | Information Redundancy                  | 205        |
|         |              | 5.5.1.2            | Physical and Other Redundancy           | 209        |
|         | 5.5.2        | Manage<br>Deadloc  | ment for Permanent Faults and           | 213        |
|         |              | 5.5.2.1            | Conventional Techniques                 | 213<br>213 |
|         |              | 5.5.2.1<br>5.5.2.2 | Fault-Caused Physical-Layer             | 210        |
|         |              | 0.0.2.2            | Deadlocks                               | 217        |

Contents 
xi

#### xii ■ Contents

| 5.6     | STRATEGY |                                              |     |
|---------|----------|----------------------------------------------|-----|
| Chapter |          |                                              |     |
|         |          |                                              |     |
| 6.1     | COMP     | ARISON WITH RELATED WORK                     | 226 |
|         | 6.1.1    | Non-QDI Designs                              | 227 |
|         | 6.1.2    | QDI Designs                                  | 228 |
|         | 6.1.3    | Unordered and Systematic Codes               | 228 |
| 6.2     | DIRC     | CODING SCHEME                                | 230 |
|         | 6.2.1    | Arithmetic Rules                             | 231 |
|         |          | 6.2.1.1 Rules for 1-of-n Codes               | 231 |
|         |          | 6.2.1.2 Rules for m-of-n Codes               | 231 |
|         | 6.2.2    | Delay-Insensitive Redundant Check Codes      | 232 |
|         | 6.2.3    | Check Generation and Error Correction        | 233 |
|         | 6.2.4    | Error Filtering                              | 234 |
|         | 6.2.5    | Code Evaluation                              | 237 |
| 6.3     | IMPLE    | MENTATION OF DIRC PIPELINES                  | 238 |
|         | 6.3.1    | 1-of-n Adders and Error Filters              | 238 |
|         | 6.3.2    | Generation of Check Words                    | 240 |
|         | 6.3.3    | Redundant Protection of Acknowledge<br>Wires | 241 |
|         | 6.3.4    | Variants of DIRC Pipelines                   | 242 |
|         |          | 6.3.4.1 Latency and Area                     | 245 |
|         |          | 6.3.4.2 Different Construction Patterns      | 245 |
|         |          | 6.3.4.3 DIRC in Asynchronous NoCs            | 247 |
| 6.4     | LATEN    | ICY AND AREA MODELS                          | 248 |
|         | 6.4.1    | Latency Analysis                             | 249 |
|         | 6.4.2    | Area Model for One Stage                     | 251 |
|         | 6.4.3    | Models for Different Constructions           | 255 |
| 6.5     | EXPE     | RIMENTAL RESULTS                             | 259 |

|         | 6.5.1        | Performance Evaluation                           | 260 |
|---------|--------------|--------------------------------------------------|-----|
|         | 6.5.2        | Fault-Tolerance Evaluation                       | 265 |
|         | 6.5.3        | Comparison with Related Work                     | 267 |
| 6.6     | SUMM         | IARY                                             | 269 |
| Chapter | 7 • D        | eadlock Detection                                | 271 |
|         |              |                                                  |     |
| 7.1     | BASE         | LINE QDI NOC                                     | 271 |
|         | 7.1.1        | Network Principles                               | 271 |
|         | 7.1.2        | Asynchronous Protocols                           | 272 |
| 7.2     | FAULT        | IMPACT ON DATA PATH                              | 273 |
|         | 7.2.1        | Fault Classifications                            | 273 |
|         | 7.2.2        | General Fault Impact                             | 274 |
| 7.3     | DETE         | CTING PERMANENT FAULT ON DATA PATH               | 277 |
|         | 7.3.1        | Data Path Partition                              | 278 |
|         | 7.3.2        | Deadlock Caused by Permanent Link<br>Fault       | 279 |
|         | 7.3.3        | Deadlock Patterns Due to Permanent Link<br>Fault | 283 |
|         | 7.3.4        | Time-Out Detection Mechanism                     | 286 |
|         | 7.3.5        | Detection of Permanent Router Fault              | 292 |
| 7.4     |              | LING DEADLOCKS CAUSED BY<br>RENT FAULTS          | 294 |
|         | 7.4.1        | Fault Diagnosis                                  | 295 |
|         | 7.4.2        | Modified Time-Out Mechanism                      | 298 |
| 7.5     | SUMM         | IARY                                             | 300 |
| Chapter | 8 <b>-</b> D | eadlock Recovery                                 | 303 |
|         |              |                                                  |     |
| 8.1     | DEAD         | LOCK REMOVAL BY DRAIN AND RELEASE                | 304 |
|         | 8.1.1        | The Drain Operation                              | 305 |

8.1.2 Buffer Controller at Router Input 306

|           | 8.1.3  | The Release Operation            | 308 |
|-----------|--------|----------------------------------|-----|
| 8.2       | FAULT  | 311                              |     |
|           | 8.2.1  | Spatial Division Multiplexing    | 311 |
|           | 8.2.2  | Switch Allocator Reconfiguration | 312 |
| 8.3       | RECO   | VERY FROM INTERMITTENT AND       |     |
|           | TRANS  | SIENT FAULTS                     | 314 |
| 8.4       | TECHN  | 316                              |     |
| 8.5       | SUMM   | 320                              |     |
| Chapter   | 9 ∎ Si | ummary                           | 323 |
|           |        |                                  |     |
| 9.1       | OVER   | ALL REMARKS                      | 324 |
| 9.2       | FUTUF  | RE WORK                          | 327 |
| Bibliogra | phy    |                                  | 329 |

### Preface

We have entered an era of multicore processors as the single-core performance has reached its ceiling along with the slowing down of the Moore's Law. Current mainstream commercial processors, such as Intel Core and Xeon, and AMD Ryzen and Epyc, are all multicore processors which contain up to 64 processing cores. In the foreseeable future, the number of cores in a processor will continue to increase.

When the number of cores reaches tens to hundreds, a significant portion of the total design effort would be dedicated to making the core-to-core communication speed and energy efficient. Although currently almost all processors use synchronous on-chip networks built by synchronous circuits, asynchronous on-chip networks may become useful or even necessary in the near future. In synchronous on-chip networks, the global clock needs to be distributed over long distances with little clock skew, which becomes a challenge as the network scales. It is estimated that the clock tree could consume 20% to 50% of the total power in synchronous circuits while the synchronous on-chip network could consume 33% to 36% of total power. To reduce the total power consumption, it is common to let individual processing cores implemented in their own clock and power domains, and run at their own clock frequencies dynamically tuned according to real-time work load. In this scenario, an asynchronous on-chip network might be a better candidate than a synchronous one.

This is a book about how to design a high throughput and faulttolerant asynchronous on-chip network for multicore and manycore processors. The state-of-the-art way of designing and optimizing asynchronous on-chip networks is to mimic the structure of synchronous on-chip networks. However, the timing division multiplexing (TDM) techniques extensively utilized in synchronous networks introduce extra synchronization and largely increase the speed penalty in asynchronous on-chip networks. Instead of TDM, we would like to introduce spatial parallelism into asynchronous networks to improve their throughput performance without incurring the synchronization penalties.

There is one annoying problem with the asynchronous on-chip networks built by quasi-delay-insensitive (QDI) circuits: They are sensitive to faults. A fault does not only corrupt a data packet, it also obstructs the handshake protocol essentially needed for correct data transmission, disrupts the normal data flow and may finally produce a deadlock paralyzing the whole network. The second half of this book is dedicated to this issue. A fault-tolerant coding method is proposed to tolerate transient faults. When a deadlock is caused by a fault, the location of the fault is first accurately pinpointed using a fault detection circuit and the network is then functionally resumed by isolating the faulty components.

This book is intended for researchers, engineers and students who research QDI and speed-independent (SI) circuits, asynchronous on-chip networks and switching networks built on QDI and SI circuits, fault-tolerant QDI circuits and finally the faulttolerant asynchronous on-chip networks.

The organization of the book follows a self-contained manner. Chapters are carefully ordered in a way that necessary background knowledge and related topics are introduced and discussed before an advanced technique is described. Readers can read through the book without resorting to related research papers and books, but they are provided in the bibliography for further references.

*Introduction* provides a context for the topics described in this book, including our motivation in doing these researches, their applications in current and future computer systems and the state of the art in related areas.

Asynchronous Circuits introduces the concept of asynchronous circuits, the timing assumptions used in different types of asynchronous circuits and the implementation of asynchronous circuits.

Asynchronous Networks-on-Chip describes all the general concepts of on-chip interconnects necessary for understanding this book. This chapter also introduces asynchronous on-chip networks. *Optimizing Asynchronous On-Chip Networks* improves the throughput performance of asynchronous on-chip networks by introducing spatial parallelism into the router design.

*Fault-Tolerant Asynchronous Circuits* begins to analyze the effect of faults on asynchronous circuits, and presents the state-of-art fault-tolerant techniques for asynchronous circuits. It shows that faults not only corrupt data but can also bring down the whole asynchronous network.

*Fault-Tolerant Coding* introduces the fault-tolerant encoding for asynchronous circuits and proposes a fault-tolerance delay-insensitive redundant check code for QDI interconnections that can tolerate transient faults.

*Deadlock Detection* describes how to detect a deadlock caused by a fault on asynchronous on-chip networks and how to locate the faulty link. This is the prerequisite for a network to recover from a fault-caused deadlock.

*Deadlock Recovery* presents deadlock recovery techniques, including an asynchronous router design and on-chip network design that can recover from a deadlock caused by faults.

Summary concludes the book and introduces the future work.

This book is based on our Ph.D. research work done in the Advanced Processor Technologies (APT) group in the School of Computer Science at the University of Manchester. We are greatly indebted to our supervisors, Dr. Doug Edwards and Dr. Jim Garside. They brought us into the world of asynchronous circuit designs, carefully guided us with their wide knowledge and insight and constantly encouraged us using their deep passion in research. We would like to express our gratitude also to the colleagues in the APT group for their direct and indirect help to this research.

> Wei Song and Guangda Zhang October 2021



# Introduction

The advancing semiconductor technology makes it possible to integrate more and more processing cores on a single chip to achieve continuously increasing chip performance, posing a growing demand for scalable and efficient interconnection. On-chip networks (OCNs) or Networks-on-Chip (NoCs) have emerged as a promising candidate to support large-scale on-chip communication. Most existing NoCs are built synchronously, which could be restricted by issues induced by the growing clock distribution as the network scales. As an alternative, event-driven asynchronous circuits which are controlled by handshake protocols rather than global clocks, can be employed to implement NoCs. Removing the clock, asynchronous NoCs have many attractive advantages over synchronous ones.

In the deep sub-micron era, reliability has become a challenge faced by the scaling electronics. Accompanied with the shrinking device dimensions, factors like the lowering voltage supply, the increasing clock frequency and the growing density of chips, have a negative impact on the chip reliability. Electronic systems are more susceptible to *faults. Fault tolerance* has become an essential objective for critical digital systems.

Fault tolerance has been systematically studied in traditional synchronous NoCs, but rarely in asynchronous ones. Using one timing-robust class of asynchronous circuits — the quasi-delay-insensitive (QDI) circuits — to implement the NoC, QDI NoCs can naturally tolerate delay variation, which is attractive for large-scale

#### 2 Asynchronous On-Chip Networks and Fault-Tolerant Techniques

NoCs. Faults have more complicated and devastated impact on QDI NoCs compared with synchronous NoCs, which is a challenging issue needed to be resolved. This book talks about the fault-tolerant on-chip networks implemented by asynchronous circuits, and targets providing holistic, efficient, resilient and cost-effective fault-tolerant solutions to asynchronous NoCs.

### 1.1 ASYNCHRONOUS CIRCUITS

Asynchronous circuits work in a clockless and self-timed manner. They are designed under certain timing assumptions, which describe their tolerance to the delay variance of gates and wires. This book concentrates on one specific timing-robust type of asynchronous circuits, the quasi-delay-insensitive (QDI) circuit, which tolerates arbitrary positive delay on all gates and wires except for some forks that are assumed isochronic (wires that have equal latency to all its fanouts). Since its strong tolerance to delay variation, QDI circuit remains functioning under extreme working conditions, such as sub-threshold supply voltage and ultra low/high temperature, naturally tolerates process variation which becomes increasingly troublesome for synchronous circuits, and requires less static timing analysis than all other types of asynchronous circuits, not to mention the synchronous ones. In addition, QDI circuit is presumably low power because it wastes no power on the clock tree and consumes nearly zero power when it is not actively in use.

Although asynchronous circuits have a long history of over 50 years [163], most very large-scale integration (VLSI) circuits are synchronous due to the mature electronic design automation (EDA) support. Since registers and latches in synchronous circuits are synchronized by the global clock, they are the natural timing boundaries by which a circuit can be divided into paths. All these paths are driven by the same clock and operate concurrently and independently. EDA tools, especially synthesis tools, are therefore able to improve speed by optimizing these paths individually. On the other hand, the latches in asynchronous circuits are driven by handshake protocols (circuits). The operation of one latch is normally triggered by events generated from other latches. It is difficult to optimize the speed of asynchronous circuits due to the lack of clear timing

boundaries to break large circuits into small analyzable pieces as in synchronous circuits. Some asynchronous synthesis tools have been proposed recently, such as Petrify [57] and Balsa [73], to translate behavioral hardware descriptions into low level netlists. However, high-speed asynchronous circuits are almost always manually designed [182, 212, 219].

Shrinking transistor geometry brings opportunities for asynchronous circuits. As the number of transistors in a single die increases corresponding to the prediction of Moore's Law, the area and power overhead of synchronizing the whole chip with one global clock is unacceptable and beyond the control of current EDA tools. Future multicore processors should be globally asynchronous and locally synchronous (GALS) designs where synchronous intellectual property (IP) blocks talk with each other using an asynchronous communication infrastructure. 49% of the global signals will be driven by asynchronous circuits by the year 2024 [104]. Variation is another problem. The decreasing transistor size increases power density which leads to temperature and power variations [98]. Process variation worsens the situation with non-deterministic cell latency. The worst case timing analysis in synchronous circuits generates over-pessimistic speed estimation [31]. Asynchronous circuits are tolerant to variations and provide average speed performance.

Designing asynchronous circuits is not an easy task compared with their synchronous counterparts. Without the mature support of commercial EDA tools, asynchronous circuits are usually fully or partially manually crafted. For this reason, this book demonstrates how to design QDI circuits from scratch by describing all implementations in gate-level Verilog HDL using normal gates available in any standard cell libraries.

### 1.2 ASYNCHRONOUS ON-CHIP NETWORKS

Current multicore processors use on-chip networks as their communication fabric. Most networks-on-chip (NoCs) are synchronous networks where network components are driven by the same or several global clocks. Thanks to the timing assumptions allowed by the global clock and mature EDA tools, these synchronous NoCs are

#### 4 Asynchronous On-Chip Networks and Fault-Tolerant Techniques

fast and area efficient. However, there are several design challenges in synchronous NoCs that are difficult to resolve:

- Support for heterogeneous networks: Unlike chip multiprocessor (CMP) systems where every network node is a homogeneous processor element, a multiprocessor system-ona-chip (MPSoC) is a heterogeneous system where network nodes are IP blocks with different functions and hardware structures. These IP blocks are provided and tested with different clock frequencies, area sizes and even working voltages. These differences complicate the network topology, compromise the latency performance of synchronous networks and make chip timing closure difficult to reach.
- Low power consumption: It is crucial to reduce the power consumption of an SoC as it determines the maximum standby time of a device. The clock tree of synchronous on-chip networks consumes a significant amount of energy [153], and it is getting worse along with the shrinking transistor geometry.
- Tolerance to variation: Process, temperature and voltage variations affect future sub-micron VLSI designs significantly [133, 138]. According to the international technology roadmap for semiconductors, the delay uncertainty caused by variations in the sign-off timing closure will reach 32% in 2024 [104]. Traditional static timing analysis is going to be replaced with statistical timing analysis methods [31] to cope with the dropping yield rate and the over-conservative timing estimation. Synchronous on-chip networks alleviate this effect by considering variations in their task mapping procedure [138]. However, this works only in homogeneous networks and the routers are still working at the worst estimated speed.

Instead of using synchronous on-chip networks, asynchronous on-chip networks are a promising solution to the above challenges. The communication components in an asynchronous on-chip network are built with clockless asynchronous circuits. Data are transmitted according to certain handshake protocols largely insensitive to delay variations [231]. Because of this delay insensitivity, the interface between all IP blocks to the global asynchronous on-chip network is unified by the same synchronous to/from asynchronous interface. The fact that all synchronous blocks are isolated by the asynchronous network simplifies chip-level timing closure. Also, thanks to the delay insensitivity, an asynchronous on-chip network is naturally tolerant to all variations as the delay uncertainty caused by these variations cannot affect the function of those handshake protocols. Finally, since no clock is needed in asynchronous circuits, an asynchronous on-chip network consumes zero dynamic power when no data is in transmission.

However, asynchronous networks [11, 22, 28, 67, 75] are often slower than the synchronous on-chip networks with similar structures and resources [153]. Although the global clock in synchronous circuits is power consuming, it is a speed- and area-efficient approach to synchronize combinational operations. Asynchronous circuits rely on handshake protocols to control data transmission. Combinational operations are explicitly detected and guarded to ensure the insensitivity to delay. The circuits used in detecting combinational operations introduce area and speed overhead. Delay insensitive asynchronous circuits are intrinsically slow.

Another issue is that the state-of-the-art way of designing asynchronous on-chip networks is to asynchronously reproduce the structures of synchronous on-chip networks. As synchronous onchip networks synchronize data with no speed penalty, timing division multiplexing (TDM) techniques [58] are extensively utilized. Simply reproducing such TDM structures in asynchronous on-chip networks introduces extra completion detection circuits and causes speed penalties.

Although the speed penalty of completion detection is unavoidable, as the promising advantages of asynchronous circuits are derived from those delay insensitive handshake protocols, the scale of the synchronization in asynchronous circuits can be limited to small transmission units, such as a single pipeline. The speed penalty is therefore alleviated. The following question is how to build asynchronous networks with such limited synchronization.

This book introduces techniques to improve network throughput by employing spatial parallelism in asynchronous on-chip

#### 6 Asynchronous On-Chip Networks and Fault-Tolerant Techniques

networks at different levels. Channel slicing is a new pipeline structure that alleviates the speed penalty of synchronization by removing it in bit-level data pipelines. It is also possible to further improve speed using the lookahead pipeline style if the QDI timing assumption is slightly relaxed. Spatial division multiplexing (SDM) is a flow control method that improves network throughput by removing the synchronization between flits of different packets, which is required by TDM methods on the contrary. The main cost of using SDM is a significantly increased crossbar inside each router. To reduce this area overhead, the crossbar can be replaced with novel switch structures, such as a novel 2-stage Clos switch dynamically reconfigured by an asynchronous dispatching algorithm.

### 1.3 FAULT-TOLERANT ASYNCHRONOUS ON-CHIP NETWORKS

On one hand, the advancing semiconductor technology boosts the chip performance and permits more processing cores to be integrated. On the other hand, accompanied with the shrinking device dimensions, all of the factors like the lowering power voltage, the increasing clock frequency and the growing density of chip impose a negative impact on the chip reliability [37].

In the deep sub-micron era, variations in manufacturing and operating conditions have a proportionately greater effect than before. Shrinking transistor dimensions means that variations in the actual manufacturing, such as dopant levels and crystal boundaries, influence transistor and wire properties with time [37]. Growing chip density results in a high heat flux across the die, creates hot spots with a high temperature, which affects the circuit performance and accelerates the device aging process. Reducing supplying voltage gives greater susceptibility to various noise sources [37, 56]. Increasing clock frequency raises the probability that noise creates faults on circuits. As a result, the sensitivity of electronic devices to environmental variations is significantly increased and the device aging process is accelerated.

It has been reported that the mean values of soft error rate (SER) of three circuits under a 40 nm process are 2.2E-4 FIT, 4.7E-4 FIT and 1.2E-4 FIT, respectively (1 FIT = 1 fail per 1 billion

hours) [257]. The 24 *MByte* of Level 3 Cache in an Intel Processor encountered  $0.2\sim2$  errors per year under the SER of  $0.0001\sim0.001$ FIT/bit [221]. An SER in the order of 0.001 FIT/bit has also been observed on the Altitude SEE test European platform [10]. It was predicted that the SER per logic state bit could increase 8% in each technology generation [95]. The SER in static random-access memory would increase  $6 \sim 7 \times$  from 130 nm to 22 nm process [102]. In 65 nm technology, the radiation can cause a  $6.45 \times$  increase in SER when the supply voltage decreases from 1.0 V to 0.33 V [187]. It is believed that both the SER and the aging speed would increase as the technology continues scaling [37, 56, 84, 173]. Although researchers disagree on the absolute number of faults in particular circuits on particular processes, they all agree that the trend is for faults to increase as processes shrink. Electronic systems are more susceptible to faults [18], including transient, intermittent and permanent faults depending period of lasting [162]. The 2015 ITRS takes *reliability* as one main challenge faced by the next generation electronics and stresses the importance of a runtime protection [39]. Consequently, *fault tolerance* has become an essential design objective for critical digital systems, especially in highly specialized fields such as aerospace, military and medical equipment.

The fault tolerance of synchronous NoCs has been extensively studied. Faults typically cause data errors (or packet loss). These errors can normally be detected and corrected within several clock cycles. A clock signal provides a timing reference for error detection and correction. Detecting the error or packet loss, a retransmission can be requested to obtain the right packet [229]. However, there is no such timing reference in an asynchronous NoC. The QDI implementations are robust to timing variations but not to faults. A fault may pollute a transmitting packet, corrupt the handshake protocol and disrupt the normal data flow, which is a new challenge faced by asynchronous circuit designers. A single fault could even break the handshake protocol and results in a *fault-caused physical-layer deadlock*. This deadlock is different from the well-known network layer one induced by the cyclic dependence of multiple competing packets [61, 63]. Most conventional fault tolerant or deadlock management techniques for synchronous NoCs cannot work in a deadlocked state. The fault tolerance of asynchronous NoCs has

#### 8 Asynchronous On-Chip Networks and Fault-Tolerant Techniques

not been thoroughly studied. Various styles of asynchronous NoCs have been proposed but rarely do they have fault-tolerance capabilities [11, 30, 67, 75, 200, 212].

Faults can be classified into transient, intermittent and permanent faults depending on their duration [56]. Transient faults usually last for a short time and behave as positive or negative glitches  $(0\rightarrow 1\rightarrow 0 \text{ or } 1\rightarrow 0\rightarrow 1)$  [18, 111]. Permanent faults will influence the victim gates or wires forever. Most permanent faults can be modeled as "stuck-at" faults [5, 137], where the logic level of a net is always 0 or 1. Intermittent faults usually happen as an early manifestation of permanent ones with the aging process [56]. They can appear as either transient or permanent during error detection or correction.

In the presence of faults, QDI NoCs behave differently from synchronous ones. A fundamental difference between synchronous and QDI circuits is the timing reference used in the transmission of data symbols.

- In synchronous circuits, a data symbol typically has a constant time per bit which can be agreed — and maintained for a known time — between the transmitter and the receiver. Corruption of the transmission will therefore affect a known number of bits. Thus faults on a synchronous NoC may corrupt packets being transmitted, lead packets to wrong destinations, result in packet loss or cause data errors. Nevertheless, the erroneous data symbol or faulty behavior can be easily detected and further corrected or recovered.
- There is no such timing reference in QDI circuits. Faults can insert or possibly delete symbols besides corrupting them. Managing these faulty cases represents a new challenge faced by QDI NoCs. Meanwhile, it is obvious that a permanent fault will stall the handshake and cause a physical-layer deadlock. Its detection and recovery has not been thoroughly studied in a NoC environment. More seriously, a transient fault cannot only cause data errors but also deadlock a QDI NoC, which has been neglected by the asynchronous community. These all increase the challenge of fault detection and recovery in QDI NoCs.



Figure 1.1 Network-layer and physical-layer deadlocks in a QDI NoC.

Deadlock is fatal to a NoC without any management mechanisms [61]. It can reduce the network performance, paralyze its function and eventually cause the chip to be discarded. The wellknown *network-layer* deadlock due to the cyclic dependence of packets or restricted routings [63] can happen in all NoCs. Figure 1.1a shows an example where four packets hold and request network resources in a cyclic fashion, which is a network-layer deadlock. It can be resolved by using specific turn models or providing extra escape channels [63]. This network-layer deadlock is common in all NoCs and not the target of this book. In QDI NoCs, a fault may break the handshake protocol, resulting in a physicallayer deadlock, which is particular to QDI NoCs. Taking a simple (req, ack) handshake process for example, if the sender sends out a request to the receiver but without getting acknowledged, the sender does not know whether this is caused by a fault or delay because QDI circuit is insensitive to delay variations. It would keep waiting for the lost *ack*, resulting a physical-layer deadlock. Figure 1.1b illustrates a faulty case that a fault on a transmitting packet deadlocks the reserved data path in the network. Note that it is the adaptability of a QDI circuit to timing variations that makes it more vulnerable to this kind of deadlock-type faults. This

#### 10 Asynchronous On-Chip Networks and Fault-Tolerant Techniques

physical-layer deadlock cannot be easily resolved by higher-layer techniques for network-layer ones.

This book studies the impact of different faults on QDI NoCs, including transient and permanent ones, and proposes thorough and systematic fault-tolerant solutions to protect QDI NoCs. The achieved fault-tolerance capability and the incurred performance and hardware overhead are two main factors considered in the evaluation.

#### 1.3.1 Protection for QDI Links

A large-scale NoC may contain a large number of long link wires, which are common in large-scale Systems-on-Chip (SoCs). Exposed to the external environment, they can be easily affected by various noise or fault sources and become the victim of timing variations or transient faults [14]. These chip-level long interconnects can be implemented as QDI pipelines to achieve high bandwidth and timingrobustness. However, a transient fault can be accepted as a valid signal in a QDI system, leading to the insertion, deletion or corruption of a data symbol. Fault-tolerant codes have been widely used to protect on-chip communication [229]. Codes also perform an important role in QDI circuits where delay-insensitive (DI) codes are used to build data symbols to encode the timing information. Most existing state-of-the-art fault-tolerant codes proposed for asynchronous circuits either compromise the timing-robustness of QDI circuits or incur large area and speed overhead. This book presents a novel delay-insensitive redundant coding (DIRC) scheme to protect QDI communication from transient faults, which can be easily adopted by existing DI or QDI interconnects without destroying their intrinsic timing-robustness. The protected QDI links can be constructed flexibly to satisfy various fault-tolerance requirement, with a moderate and reasonable hardware overhead.

#### 1.3.2 Deadlock Detection

Both permanent and transient faults could break the handshake process in QDI NoCs and generate a physical-layer deadlock, which has more serious consequences to the system than pure data errors. The management of this fault-caused physical-layer deadlock is significantly important to the chip life-span but it has barely been studied. To resume from a physical-layer deadlock, the network must go through two phases: deadlock *detection* and *recovery*. Detection of a fault-caused physical-layer deadlock is difficult in a QDI NoC. In a deadlocked state, error syndromes for fault analysis cannot be easily collected. Locating a specific defective wire or gate is difficult. The ideal situation is that the faulty component can be precisely located so that a recovery method can be further applied to bypass or replace the faulty component, which consequently resumes the network functionality. Therefore, an efficient and flexible detection method is necessary. It should be able to not only precisely locate the fault in the QDI NoC, but also differentiate the fault-caused physical-layer deadlock from other similar network scenarios, including the upper network-layer deadlock and the network congestion. When both transient and permanent faults are considered, an accurate model is needed to differentiate deadlocks caused by different faults, so as to enable the *fault diagnosis*. The proposed techniques should be able to detect, diagnose and locate the fault as long as the fault deadlocks the network.

#### 1.3.3 Network Recovery

As the fault position has been located, the next step is to recover the network function according to the deadlocked state of the handshake protocol and the network protocol. Figure 1.1b shows one possible deadlock case where a fault deadlocks a reserved long packet path composed of the faulty link and other healthy network sources. A direct system reboot can temporarily remove this deadlock, but it is expensive and cannot deal with the deadlock caused by permanent faults. Therefore, a fine-grained recovery strategy is necessary to remove the deadlock and isolate the faulty component. The recovery contains two main processes: (1) deadlock removal, which recovers the stalled packet flow in the deadlocked packet path, releasing blocked healthy network resources and eliminating the deadlock and (2) faulty link isolation: instead of using upper network-layer methods such as fault-tolerant routings to detour the faulty link, this book proposes a fine-grained recovery technique at the lower physical layer to isolate the faulty component and restore

#### 12 Asynchronous On-Chip Networks and Fault-Tolerant Techniques

the network function. Upper layer recovery techniques can further be used to improve the network performance after the loss of the faulty component. When transient and intermittent faults deadlock the network, the isolated link should be resumed when the fault fades.

### References

Adrijean Adriahantenaina, Hervé Charlery, Alain Greiner, Laurent Mortiez, and Cesar Albenes Zeferino. SPIN: A scalable, packet switched, on-chip micro-network. In *Proceedings of the Design, Automation & Test in Europe Conference & Exhibition*, pages 20070–20073, Washington, DC, USA, March 2003. IEEE Computer Society.

Melinda Y. Agyekum, and Steven M. Nowick. An error-correcting unordered code and hardware support for robust asynchronous global communication. In *Proceedings of the Design, Automation & Test in Europe Conference & Exhibition*, pages 765–770. IEEE Computer Society, March 2010.

Melinda Y. Agyekum , and Steven M. Nowick . Error-correcting unordered codes and hardware support for robust asynchronous global communication. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 31(1):75–88, January 2012. Robert C. Aitken , Görschwin Fey , Zbigniew T. Kalbarczyk , Frank Reichenbach , and Matteo Sonza Reorda . Reliability analysis reloaded: How will we survive? In *Proceedings of the Design, Automation & Test in Europe Conference & Exhibition* , pages 358–367, 2013. Sami A. Al-Arian and Dharma P. Agrawal . Physical failures and fault models of CMOS circuits. IEEE Transactions on Circuits and Systems, 34(3):269–279, March 1987. Ra'ed Al-Dujaily , Terrence S. T. Mak , Fei Xia , Alexandre Yakovlev , and Maurizio Palesi . Run-time deadlock detection in networks-on-chip using coupled transitive closure networks. In *Proceedings of the Design, Automation & Test in Europe Conference & Exhibition* , pages 497–502. IEEE, March 2011.

Homa Alemzadeh , Ravishankar K. Iyer , Zbigniew Kalbarczyk , and Jai Raman . Analysis of safety-critical computer failures in medical devices. IEEE Security & Privacy, 11(4):14–26, July 2013.

Sobeeh Almukhaizim , Feng Shi , Eric Love , and Yiorgos Makris . Soft-error tolerance and mitigation in asynchronous burst-mode circuits. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 17(7):869–882, 2009.

Thomas E. Anderson , Susan S. Owicki , James B. Saxe , and Charles P. Thacker . Highspeed switch scheduling for local-area networks. ACM Transactions on Computer Systems, 11(4):319–352, November 1993.

J. L. Autran , D. Munteanu , S. Moindjie , T. Saad Saoud , S. Sauze , G. Gasiot , and P. Roche . ASTEP (2005–2015): Ten years of soft error and atmospheric radiation characterization on the Plateau de Bure. Microelectronics Reliability, 55(9-10):1506–1511, August 2015.

John Bainbridge and Steve Furber . Chain: A delay-insensitive chip area interconnect. IEEE Micro, 22(5):16–23, 2002.

John Bainbridge , Will Toms , Doug Edwards , and Steve Furber . Delay-insensitive, point-topoint interconnect using m-of-n codes. In *Proceedings of the IEEE International Symposium on Asynchronous Circuits and Systems* , pages 132–140. IEEE Computer Society, May 2003.

William John Bainbridge . *Asynchronous system-on-chip interconnect*. PhD thesis, Department of Computer Science, the Faculty of Science & Engineering, the University of Manchester, March 2000.

http://apt.cs.manchester.ac.uk/ftp/pub/apt/theses/bainbridge\_phd.pdf.

William John Bainbridge and Sean James Salisbury . Glitch sensitivity and defense of quasi delay-insensitive network-on-chip links. In *Proceedings of the IEEE International Symposium on Asynchronous Circuits and Systems*, pages 35–44. IEEE Computer Society, May 2009. Ramanatha V. Balakrishnan , and Desmond W. L. Young . System for reducing skew in the parallel transmission of multi-bit data slices, March 1992. US Patent 5,101,347.

Arnab Banerjee and Simon W. Moore . Flow-aware allocation for on-chip networks. In *Proceedings of the International Symposium on Networks-on-Chips* , pages 183–192. IEEE Computer Society, May 2009.

Rodrigo Possamai Bastos, Gilles Sicard, Fernanda Kastensmidt, Marc Renaudin, and Ricardo Reis. Evaluating transient-fault effects on traditional C-element's implementations. In *Proceedings of the IEEE International On-Line Testing Symposium*, pages 35–40, July 2010.

Robert C. Baumann . Soft errors in advanced semiconductor devices-part I: The three radiation sources. IEEE Transactions on Device and Materials Reliability, 1(1):17–22, 2001. Robert C. Baumann . Radiation-induced soft errors in advanced semiconductor technologies. IEEE Transactions on Device and Materials Reliability, 5(3):305–316, 2005.

Robert C. Baumann , Tim Hossain , Eric Smith , Shinya Murata , and Hideki Kitagawa . Boron as a primary source of radiation in high density DRAMs. In *Proceedings of the Symposium on VLSI Technology* , pages 81–82, June 1995.

Edith Beigné , Fabien Clermidy , Sylvain Miermont , and Pascal Vivet . Dynamic voltage and frequency scaling architecture for units integration within a GALS NoC. In *Proceedings of the ACM/IEEE International Symposium On Networks-on-Chip* , pages 129–138. IEEE Computer Society, April 2008.

Edith Beigné , Fabien Clermidy , Pascal Vivet , Alain Clouard , and Marc Renaudin . An asynchronous NOC architecture providing low latency service and its multi-level design framework. In *Proceedings of the IEEE International Symposium on Asynchronous Circuits and Systems* , pages 54–63. IEEE Computer Society, March 2005.

Václav E. Beneš . On rearrangeable three-stage connecting networks. Bell System Technical Journal, 41(5):1481–1492, September 1962.

Luca Benini and Giovanni De Micheli . Networks on chips: A new SoC paradigm. IEEE Computer, 35(1):70–78, 2002.

Davide Bertozzi and Luca Benini . Xpipes: A network-on-chip architecture for gigascale systems-on-chip. IEEE Circuits and Systems Magazine, 4(2):18–31, 2004.

D. Binder , E. C. Smith , and A. B. Holman . Satellite anomalies from galactic cosmic rays. IEEE Transactions on Nuclear Science, 22(6):2675–2680, December 1975.

Tobias Bjerregaard , Shankar Mahadevan , Rasmus Grøndahl Olsen , and Jens Sparsø . An OCP compliant network adapter for GALS-based SoC design using the MANGO network-onchip. In *Proceedings of the International Symposium on System-on-Chip* , pages 171–174. IEEE, November 2005.

Tobias Bjerregaard and Jens Sparsø . A router architecture for connection-oriented service guarantees in the MANGO clockless network-on-chip. In *Proceedings of the Design, Automation & Test in Europe Conference & Exhibition*, pages 1226–1231. IEEE Computer Society, March 2005.

Tobias Bjerregaard and Jens Sparsø. A scheduling discipline for latency and bandwidth guarantees in asynchronous network-on-chip. In *Proceedings of the International Symposium on Advanced Research in Asynchronous Circuits and Systems*, pages 34–43. IEEE Computer Society, March 2005.

Tobias Bjerregaard and Jens Sparsø . Implementation of guaranteed services in the MANGO clockless network-on-chip. IEE Proceedings — Computers and Digital Techniques, 153(4):217–229, 2006.

David Blaauw , Kaviraj Chopra , Ashish Srivastava , and Lou Scheffer . Statistical timing analysis: From basic principles to state of the art. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 27(4):589–607, April 2008.

Geoffrey Blake , Ronald Dreslinski , and Trevor Mudge . A survey of multicore processors. IEEE Signal Processing Magazine, 26(6):26–37, November 2009.

Mario Blaum and Jehoshua Bruck . Unordered error-correcting codes and their applications. In *Proceedings of the International Symposium on Fault-Tolerant Computing* , pages 486–493, July 1992.

Paul Bogdan and Radu Mărculescu . A theoretical framework for on-chip stochastic communication analysis. In *Proceedings of the International ICST Conference on Nano-Networks*, pages 1–5. IEEE, September 2006.

Mark Bohr and Kaizad Mistry. Intel's revolutionary 22 nm transistor technology, May 2011. http://www.intel.com/content/www/us/en/silicon-innovations/revolutionary-22nm-transistor-technology-presentation.html.

Evgeny Bolotin , Israel Cidon , Ran Ginosar , and Avinoam Kolodny . Routing table minimization for irregular mesh NoCs. In *Proceedings of the Design, Automation & Test in Europe Conference & Exhibition* , pages 942–947. IEEE, April 2007.

Shekhar Y. Borkar . Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro, 25(6):10–16, November 2005.

Bella Bose . On unordered codes. IEEE Transactions on Computers, 40(2):125–131, 1991. W. R. Bottoms . A roadmap for heterogeneous integration in electronics, May 2015. http://www.itrs2.net/itrs-reports.html.

Fred A. Bower , Daniel J. Sorin , and Sule Ozev . A mechanism for online diagnosis of hard faults in microprocessors. In *Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture* , pages 197–208. IEEE Computer Society, November 2005.

P. D. Bradley and E. Normand . Single event upsets in implantable cardioverter defibrillators. IEEE Transactions on Nuclear Science, 45(6):2929–2940, 1998.

John F. Bulzacchelli , Timothy O. Dickson , Frank D. Ferraiolo , Robert J. Reese , and Michael B. Spear . Calibration of multiple parallel data communications lines for high skew conditions, March 2014. US Patent 8,681,839.

Alex Bystrov , David Kinniment , and Alex Yakovlev . Priority arbiters. In *Proceedings of the International Symposium on Asynchronous Circuits and Systems* , pages 128–137. IEEE Computer Society, 2000.

Thomas J. Chaney and Charles E. Molnar . Anomalous behavior of synchronizer and arbiter circuits. IEEE Transactions on Computers, 22(4):421–422, April 1973.

H. Jonathan Chao , Kung-Li Deng , and Zhigang Jing . PetaStar: A petabit photonic packet switch. IEEE Journal on Selected Areas in Communications, 21(7):1096–1112, September 2003.

H. Jonathan Chao , Zhigang Jing , and Soung Y. Liew . Matching algorithms for three-stage bufferless Clos network switches. IEEE Communications Magazine, 41(10):46–54, October 2003.

H. Jonathan Chao , Cheuk H. Lam , and Eiji Oki . Broadband Packet Switching Technologies: A Practical Guide to ATM Switches and IP Routers. John Wiley & Sons, Inc., 2001.

Fu-Chiung Cheng and Shuen-Long Ho . Efficient systematic error-correcting codes for semidelay-insensitive data transmission. In *Proceedings of the IEEE International Conference on Computer Design*, pages 24–29. IEEE Computer Society, September 2001.

Jan Cheyns , Chris Develder , Erik Van Breusegem , Didier Colle , Filip De Turck , Paul Lagasse , Mario Pickavet , and Piet Demeester . Clos lives on in optical packet switching. IEEE Communications Magazine, 42(2):114–121, February 2004.

Ge-Ming Chiu . The odd-even turn model for adaptive routing. IEEE Transactions on Parallel and Distributed Systems, 11(7):729–738, July 2000.

Fabio M. Chiussi , Joseph G. Kneuer , and Vijay P. Kumar . Low-cost scalable switching solutions for broadband networking: The ATLANTA architecture and chipset. IEEE Communications Magazine, 35(12):44–53, December 1997.

Tam-Anh Chu . *Synthesis of Self-Timed VLSI Circuits from Graph-Theoretic Specifications*. PhD thesis, Massachusetts Institute of Technology, 1987.

https://dspace.mit.edu/handle/1721.1/14794.

Fabien Clermidy , Christian Bernard , Romain Lemaire , Jerome Martin , Ivan Miro-Panades , Yvain Thonnart , Pascal Vivet , and Norbert Wehn . A 477mW NoC-based digital baseband for MIMO 4G SDR. In *Proceedings of the IEEE International Solid-State Circuits Conference* , pages 278–279. IEEE, February 2010.

Charles Clos . A study of non-blocking switching networks. Bell System Technical Journal, 32(5):406–424, March 1953.

Nicola Concer , Luciano Bononi , Michael Soulié , and Riccardo Locatelli . CTC: An end-toend flow control protocol for multi-core systems-on-chip. In *Proceedings of the International Symposium on Networks-on-Chips* , pages 193–202. IEEE Computer Society, May 2009. Cristian Constantinescu . Trends and challenges in VLSI circuit reliability. IEEE Micro, 23(4):14–19, 2003.

Jordi Cortadella , Michael Kishinevsky , Alex Kondratyev , Luciano Lavagno , and Alex Yakovlev . Petrify: A tool for manipulating concurrent specifications and synthesis of asynchronous controllers. IEICE Transactions on Information and Systems, E80-D(3):315–325, 1997.

William J. Dally . Virtual-channel flow control. IEEE Transactions on Parallel and Distributed Systems, 3(2):194–205, March 1992.

William J. Dally and Hiromichi Aoki . Deadlock-free adaptive routing in multicomputer networks using virtual channels. IEEE Transactions on Parallel and Distributed Systems, 4(4):466–475, April 1993.

William J. Dally and Charles L. Seitz . The torus routing chip. Distributed Computing, 1(4):187–196, December 1986.

William J. Dally and Charles L. Seitz . Deadlock-free message routing in multiprocessor interconnection networks. IEEE Transactions on Computers, C-36(5):547–553, 1987. William J. Dally and Brian Towles . Route packets, not wires: On-chip interconnection networks. In *Proceedings of the Design Automation Conference*, pages 684–689. IEEE, ACM, 2001.

William James Dally and Brian Patrick Towles . Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers, San Francisco, 2004.

Ilana David , Ran Ginosar , and Michael Yoeli . Self-timed is self-checking. Journal of Electronic Testing, 6(2):219–228, 1995.

Mark E. Dean , Ted E. Williams , and David L. Dill . Efficient self-timing with level-encoded 2-phase dual-rail (LEDR). In *Proceedings of the University of California/Santa Cruz Conference on Advanced Research in VLSI* , pages 55–70. MIT Press, April 1991.

Charles Dike and Edward Ted Burton . Miller and noise effects in a synchronizing flip-flop. IEEE Journal of Solid-State Circuits, 34(6):849–855, June 1999.

Rostislav (Reuven) Dobkin , Ran Ginosar , and Avinoam Kolodny . QNoC asynchronous router. Integration, the VLSI Journal, 42(2):103–115, March 2009.

Rostislav (Reuven) Dobkin , Ran Ginosar , and Christos P. Sotiriou . Data synchronization issues in GALS SoCs. In *Proceedings of the International Symposium on Advanced Research in Asynchronous Circuits and Systems* , pages 170–180. IEEE Computer Society, April 2004.

Rostislav Reuven Dobkin , Yevgeny Perelman , Tuvia Liran , Ran Ginosar , and Avinoam Kolodny . High rate wave-pipelined asynchronous on-chip bit-serial data link. In *Proceedings of the IEEE International Symposium on Asynchronous Circuits and Systems* . IEEE, March 2007.

Hao Dong . Modified berger codes for detection of unidirectional errors. IEEE Transactions on Computers, C-33(6):572–575, June 1984.

José Duato , Sudhakar Yalamanchili , and Lionel Ni . Interconnection Networks: An Engineering Approach. Morgan Kaufmann Publishers, San Francisco, CA, 2003.

Tudor Dumitras and Radu Mărculescu . On-chip stochastic communication. In *Proceedings* of the Design, Automation & Test in Europe Conference & Exhibition, pages 10790–10795. IEEE Computer Society, March 2003.

Doug Edwards and Andrew Bardsley . Balsa: An asynchronous hardware synthesis language. The Computer Journal, 45(1):12–18, 2002.

Dan Ernst , Nam Sung Kim , Shidhartha Das , Sanjay Pant , Rajeev R. Rao , Toan Pham , Conrad H. Ziesler , David T. Blaauw , Todd M. Austin , Krisztián Flautner , and Trevor N. Mudge . Razor: A low-power pipeline based on circuit-level timing speculation. In *Proceedings of the Annual International Symposium on Microarchitecture* , pages 7–18. IEEE Computer Society, December 2003.

Tomaz Felicijan. *Quality-of-Service (QoS) for Asynchronous On-Chip Networks*. PhD thesis, Department of Computer Science, the Faculty of Science and Engineering, the University of Manchester, 2004. http://apt.cs.manchester.ac.uk/ftp/pub/amulet/theses/TomazPhD.pdf. Tomaz Felicijan , John Bainbridge , and Steve Furber . An asynchronous low latency arbiter for quality of service (QoS) applications. In *Proceedings of the International Conference on* 

*Microelectronics*, pages 123–126. IEEE, December 2003. Tomaz Felicijan and Steve B. Furber . An asynchronous on-chip network router with qualityof-service (QoS) support. In *Proceedings of the IEEE International SOC Conference*, pages 274–277. IEEE, September 2004.

Chaochao Feng , Zhonghai Lu , Axel Jantsch , Minxuan Zhang , and Zuocheng Xing . Addressing transient and permanent faults in NoC with efficient fault-tolerant deflection router. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 21(6):1053–1066, 2013. Werner Friesenbichler , Thomas Panhofer , and Martin Delvai . A comprehensive approach for soft error tolerant four state logic. In *Proceedings of the IEEE Symposium on Design and Diagnostics of Electronic Circuits and Systems* , pages 214–217. IEEE Computer Society, April 2009.

Werner Friesenbichler and Andreas Steininger . Soft error tolerant asynchronous circuits based on dual redundant four state logic. In *Proceedings of the Euromicro Conference on Digital System Design, Architectures, Methods and Tools*, pages 100–107. IEEE Computer Society, August 2009.

Steve Furber and John Bainbridge . Future trends in SoC interconnect. In *Proceedings of the International Symposium on System-on-Chip* , pages 183–186. IEEE, November 2005. Steve Furber and Paul Day . Four-phase micropipeline latch control circuits. IEEE

Transactions on Very Large Scale Integration (VLSI) Systems, 4(2):247–253, June 1996. K. T. Gardiner , Alexandre Yakovlev , and Alexandre V. Bystrov . A C-element latch scheme with increased transient fault tolerance for asynchronous circuits. In *Proceedings of the IEEE International On-Line Testing Symposium* , pages 223–230, July 2007.

Gilles Gasiot , Maximilien Glorieux , Sylvain Clerc , Dimitri Soussan , Fady Abouzeid , and Philippe Roche . Experimental soft error rate of several flip-flop designs representative of production chip in 32 nm CMOS technology. IEEE Transactions on Nuclear Science, 60(6):4226–4231, December 2013.

Ran Ginosar . Fourteen ways to fool your synchronizer. In *Proceedings of the International Symposium on Asynchronous Circuits and Systems*, pages 89–96. IEEE Computer Society, May 2003.

Ran Ginosar . Metastability and synchronizers: A tutorial. IEEE Design & Test of Computers, 28(5):23–35, 2011.

Christopher J. Glass and Lionel M. Ni . The turn model for adaptive routing. Journal of the ACM, 41(5):874–902, September 1994.

Stanislavs Golubcovs , Delong Shang , Fei Xia , Andrey Mokhov , and Alex Yakovlev . Modular approach to multi-resource arbiter design. In *Proceedings of the IEEE Symposium on Asynchronous Circuits and Systems* , pages 107–116. IEEE, May 2009.

Stanislavs Golubcovs , Delong Shang , Fei Xia , Andrey Mokhov , and Alex Yakovlev . Multiresource arbiter decomposition. Technical report, Microelectronic System Design Group, School of EECE, Newcastle University, February 2009. http://async.org.uk/tech-reports/NCL-EECE-MSD-TR-2009-143.pdf.

Crispín Gómez , María E. Gómez , Pedro López , and José Duato . Exploiting wiring resources on interconnection network: Increasing path diversity. In *Proceedings of the Euromicro Conference on Parallel, Distributed and Network-Based Processing* , pages 20–29. IEEE, February 2008.

Mara Engracia Gómez , Nils Agne Nordbotten , Jose Flich , Pedro López , Antonio Robles , José Duato , Tor Skeie , and Olav Lysne . A routing methodology for achieving fault tolerance in direct networks. IEEE Transactions on Computers, 55(4):400–415, April 2006. Kees Goossens , John Dielissen , and Andrei Radulescu . Æthereal network on chip: Concepts, architectures, and implementations. IEEE Design & Test of Computers, 22(5):414–421, September 2005.

Richard Wesley Hamming . Error detecting and error correcting codes. Bell System technical journal, 29(2):147–160, 1950.

Jeremie Hamon and Edith Beigne . Automatic leakage control for wide range performance QDI asynchronous circuits in FD-SOI technology. In *Proceedings of the IEEE International Symposium on Asynchronous Circuits and Systems*, pages 142–149. IEEE, May 2013. P. Hazucha, T. Karnik, J. Maiz, S. Walstra, B. Bloechel, J. Tschanz, G. Dermer, S. Hareland, P. Armstrong, and S. Borkar . Neutron soft error rate measurements in a 90-nm CMOS process and scaling trends in SRAM from 0.25-µm to 90-nm generation. In *Proceedings of the IEEE International Electron Devices Meeting*, pages 21.5.1–21.5.4. IEEE, December 2003.

Jörg Henkel , Wayne Wolf , and Srimat Chakradhar . On-chip networks: A scalable, communication-centric embedded system design paradigm. In *Proceedings of the International Conference on VLSI Design* , pages 845–0851, 2004.

Jingcao Hu and Radu Mărculescu . DyAD — smart routing for networks-on-chip. In *Proceedings of the Design Automation Conference* , pages 260–263. ACM, June 2004.

Wei Huang , Mircea R. Stan , Kevin Skadron , Karthik Sankaranarayanan , Shougata Ghosh , and Sivakumar Velusamy . Compact thermal modeling for temperature-aware design. In *Proceedings of the Annual Design Automation Conference* , pages 878–883. ACM, 2004. Henrik Hulgaard , Steven M. Burns , and Gaetano Borriello . Testing asynchronous circuits: A survey. Integration, the VLSI Journal, 19(3):111–131, 1995.

Jameel Hussein and Gary Swift . Mitigating single-event upsets, May 2015.

http://www.xilinx.com/support/documentation/white\_papers/wp395-Mitigating-SEUs.pdf. P. D. Hyde and G. Russell . A comparative study of the design of synchronous and asynchronous self-checking RISC processors. In *Proceedings of the IEEE International On-Line Testing Symposium*, pages 89–94. IEEE Computer Society, July 2004.

Eishi Ibe , Hitoshi Taniguchi , Yasuo Yahagi , Ken-ichi Shimbo , and Tadanobu Toba . Impact of scaling on neutron-induced soft error in SRAMs from a 250 nm to a 22 nm design rule. IEEE Transactions on Electron Devices, 57(7):1527–1538, July 2010.

Masashi Imai and Tomohiro Yoneda . Improving dependability and performance of fully asynchronous on-chip networks. In *Proceedings of the IEEE International Symposium on Asynchronous Circuits and Systems*, pages 65–76. IEEE Computer Society, April 2011. ITRS. International technology roadmap for semiconductors . Technical report, Semiconductor Industry Association, 2009.

http://www.itrs.net/Links/2009ITRS/2009Chapters\_2009Tables/2009\_Design.pdf. Wonjin Jang . *Soft-error tolerant quasi delay-insensitive circuits*. PhD thesis, California Institute of Technology, 2008. https://thesis.library.caltech.edu/5260/6/jang-wonjin-2008thesis.pdf.

Wonjin Jang and Alain J. Martin . SEU-tolerant QDI circuits. In *Proceedings of the International Symposium on Advanced Research in Asynchronous Circuits and Systems*, pages 156–165. IEEE Computer Society, March 2005.

Natlie Enright Jerger and Li-Shiuan Peh . On-Chip Networks. Morgan & Claypool Publishers, 2009.

Niraj K. Jha . Separable codes for detecting unidirectional errors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 8(5):571–574, 1989.

Mark B. Josephs and Jelio T. Yantchev . CMOS design of the tree arbiter element. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 4(4):472–476, December 1996.

Norhuzaimin Julai , Alexandre Yakovlev , and Alexandre V. Bystrov . Error detection and correction of single event upset (SEU) tolerant latch. In *Proceedings of the IEEE International On-Line Testing Symposium* , pages 1–6. IEEE Computer Society, June 2012.

Tanay Karnik , Peter Hazucha , and Jagdish Patel . Characterization of soft errors caused by single event upsets in CMOS processes. IEEE Transactions on Dependable and Secure Computing, 1(2):128–143, 2004.

Mark J. Karol , Micheal G. Hluchyj , and Samuel P. Morgan . Input versus output queueing on a space-division packet switch. IEEE Transactions on Communications, 35(12):1347–1356, December 1987.

Parviz Kermani and Leonard Kleinrock . Virtual cut-through: A new computer communication switching technique. Computer Networks, 3(4):257–286, 1979.

John Kim , William J. Dally , Brain Towles , and Amit K. Gupta . Microarchitecture of a high radix router. In *Proceedings of the International Symposium on Computer Architecture* , pages 420–431, June 2005.

David J. Kinniment . Synchronization and Arbitration in Digital Systems. John Wiley & Sons Ltd, December 2007.

J. L. Knighten , N. W. Smith , L. O. Hoeft , and J. T. DiBene II . EMI common-mode current dependence on delay skew imbalance in high speed differential transmission lines operating at 1 gigabit/second data rates. In *Proceedings of the International Symposium on Quality of Electronic Design* , pages 309–314. IEEE Computer Society, March 2000.

Israel Koren and Zahava Koren . Defect tolerance in VLSI circuits: Techniques and yield analysis. Proceedings of the IEEE, 86(9):1819–1838, 1998.

Srivathsan Krishnamohan and Nihar R. Mahapatra . A highly-efficient technique for reducing soft errors in static CMOS circuits. In *Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers & Processors* , pages 126–131. IEEE Computer Society, October 2004.

Milos Krstic , Eckhard Grass , Frank K. Gürkaynak , and Pascal Vivet . Globally asynchronous, locally synchronous circuits: Overview and outlook. IEEE Design & Test of Computers, 24(5):430–441, September 2007.

Weidong Kuang , Enjun Xiao , Casto Manuel Ibarra , and Peiyi Zhao . Design asynchronous circuits for soft error tolerance. In *Proceedings of the IEEE International Conference on Integrated Circuit Design and Technology* , pages 1–5, May 2007.

Weidong Kuang , Peiyi Zhao , Jiann-Shiun Yuan , and Ronald F. DeMara . Design of asynchronous circuits for high soft error tolerance in deep submicrometer CMOS circuits. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 18(3):410–422, March 2010.

Amit Kumar , Partha Kundu , Arvind Singh , Li-Shiuan Peh , and Niraj Jha . A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS. In *Proceedings of the International Conference on Computer Design* , pages 63–70. IEEE, October 2007.

J. Lach , W. H. Mangione-Smith , and M. Potkonjak . Runtime logic and interconnect fault recovery on diverse FPGA architectures. In *Proceedings of the Military and Aerospace Applications of Programmable Devices and Technologies International Conference*, 1999. Christopher LaFrieda and Rajit Manohar . Fault detection and isolation techniques for quasi delay-insensitive circuits. In *Proceedings of the International Conference on Dependable Systems and Networks*, pages 41–50. IEEE Computer Society, June 2004.

Reed K. Lawrence and Andrew T. Kelly . Single event effect induced multiple-cell upsets in a commercial 90 nm CMOS digital technology. IEEE Transactions on Nuclear Science, 55(6):3367–3374, 2008.

Jakob Lechner , Martin Lampacher , and Thomas Polzer . A robust asynchronous interfacing scheme with four-phase dual-rail coding. In *Proceedings of the International Conference on Application of Concurrency to System Design* , pages 122–131, June 2012.

Jakob Lechner , Andreas Steininger , and Florian Huemer . Methods for analysing and improving the fault resilience of delay-insensitive codes. In *Proceedings of the IEEE International Conference on Computer Design* , pages 519–526. IEEE Computer Society, October 2015.

Jakob Lechner and Varadan Savulimedu Veeravalli . Modular redundancy in a GALS system using asynchronous recovery links. In *Proceedings of the IEEE International Symposium on Asynchronous Circuits and Systems*, pages 23–30. IEEE Computer Society, May 2013. Soojung Lee . A deadlock detection mechanism for true fully adaptive routing in regular wormhole networks. Computer Communications, 30(8):1826–1840, 2007.

Teijo Lehtonen , Pasi Liljeberg , and Juha Plosila . Online reconfigurable self-timed links for fault tolerant NoC. VLSI Design, 2007:1–13, 2007.

Teijo Lehtonen , David Wolpert , Pasi Liljeberg , Juha Plosila , and Paul Ampadu . Selfadaptive system for addressing permanent errors in on-chip interconnects. IEEE

Transactions on Very Large Scale Integration (VLSI) Systems, 18(4):527–540, 2010. Anthony Leroy, Dragomir Milojevic, Diederik Verkest, Frédéric Robert, and Francky Catthoor. Concepts and implementation of spatial division multiplexing for guaranteed throughput in networks-on-chip. IEEE Transactions on Computers, 57(9):1182–1195, September 2008.

Bin Li , Li-Shiuan Peh , and Priyadarsan Patra . Impact of process and temperature variations on network-on-chip design exploration. In *Proceedings of the ACM/IEEE International Symposium on Networks-on-Chip* , pages 117–126, April 2008.

S.-H. Lo , D. A. Buchanan , Y. Taur , and W. Wang . Quantum-mechanical modeling of electron tunneling current from the inversion layer of ultra-thin-oxide nMOSFET's. IEEE Electron Device Letters, 18(5):209–211, 1997.

Faiq Khalid Lodhi , Syed Rafay Hasan , Osman Hasan , and Falah Awwad . Low power soft error tolerant macro synchronous micro asynchronous (MSMA) pipeline. In *Proceedings of the IEEE Computer Society Annual Symposium on VLSI* , pages 601–606. IEEE, July 2014.

Kia Seng Low and Alexandre Yakovlev . Token ring arbiters: An exercise in asynchronous logic design with Petri nets. Technical Report 537, Department of Computer Science, University of Newcastle upon Tyne, November 1995. https://eprint.ncl.ac.uk/160499.

Rakan Maddah , Rami G. Melhem , and Sangyeun Cho . RDIS: Tolerating many stuck-at faults in resistive memory. IEEE Transactions on Computers, 64(3):847–861, March 2015. Sohaib Majzoub , Resve Saleh , and Rabab Ward . PVT variation impact on voltage island formation in MPSoC design. In *Proceedings of the International Symposium on Quality of Electronic Design* , pages 814–819. IEEE Computer Society, March 2009.

Alain J. Martin . The design of a self-timed circuit for distributed mutual exclusion. Technical Report CaltechCSTR:1983.5097-tr-83, California Institute of Technology, 1983.

http://resolver.caltech.edu/CaltechCSTR:1983.5097-tr-83.

Alain J. Martin . The limitations to delay-insensitivity in asynchronous circuits. In *Proceedings of the MIT Conference on Advanced Research in VLSI* , pages 263–278. MIT Press, 1990.

Alain J. Martin . Synthesis of asynchronous VLSI circuits. Technical Report CaltechCSTR:1991.cs-tr-93-28, California Institute of Technology, 1991. https://resolver.caltech.edu/CaltechCSTR:1991.cs-tr-93-28.

Alain J. Martin and Pieter J. Hazewindus . Testing delay-insensitive circuits. In *Proceedings* of the University of California/Santa Cruz conference on Advanced research in VLSI , pages 118–132. MIT Press, 1991.

Alain J. Martin , Andrew M. Lines , and Uri V. Cummings . Asynchronous circuits with pipelined completion process, December 2002. US Patent 6,502,180.

Juan-Miguel Martinez-Rubio , Pedro López , and José Duato . A cost-effective approach to deadlock handling in wormhole networks. IEEE Transactions on Parallel and Distributed Systems, 12(7):716–729, 2001.

Philippe Maurine , Jean-Baptiste Rigaud , Ghislain Bouesse , Gilles Sicard , and Marc Renaudin . Static implementation of QDI asynchronous primitives. In *Proceedings of the International Workshop on Power and Timing Modeling, Optimization and Simulation* , pages 181–191. Springer Berlin Heidelberg, September 2003.

Timothy C. May and Murray H. Woods . A new physical mechanism for soft errors in dynamic memories. In *Proceedings of the Annual Reliability Physics Symposium*, pages 33–40, April 1978.

Timothy C. May and Murray H. Woods . Alpha-particle-induced soft errors in dynamic memories. IEEE Transactions on Electron Devices, 26(1):2–9, 1979.

Edward J. McCluskey . Built-in self-test techniques. IEEE Design & Test of Computers, 2(2):21–28, April 1985.

Peggy B. McGee, Melinda Y. Agyekum, Moustafa A. Mohamed, and Steven M. Nowick. A level-encoded transition signaling protocol for high-throughput asynchronous global communication. In *Proceedings of the IEEE International Symposium on Asynchronous Circuits and Systems*, pages 116–127. IEEE Computer Society, April 2008.

Joe W. McPherson . Time dependent dielectric breakdown physics — models revisited. Microelectronics Reliability, 52(9-10):1753–1760, 2012.

Sarah E. Michalak , Kevin W. Harris , Nicolas W. Hengartner , Bruce E. Takala , and Stephen A. Wender . Predicting the number of fatal soft errors in Los Alamos national laboratory's ASC Q supercomputer. IEEE Transactions on Device and Materials Reliability, 5(3):329–335, 2005.

Giovanni De Micheli and Luca Benini . Networks on Chips: Technology and Tools. Morgan Kaufmann, 2006.

Ivan Miro-Panades , Fabien Clermidy , Pascal Vivet , and Alain Greiner . Physical implementation of the DSPIN network-on-chip in the FAUST architecture. In *Proceedings of the ACM/IEEE International Symposium on Networks-on-Chip* , pages 139–148. IEEE, April 2008.

Mahim Mishra and Seth Copen Goldstein . Defect tolerance at the end of the roadmap. In *Proceedings of the International Test Conference* , pages 1201–1210. IEEE Computer Society, October 2003.

Kartik Mohanram and Nur A. Touba . Cost-effective approach for reducing soft error failure rate in logic circuits. In *Proceedings of the International Test Conference* , volume 1, pages

893–901, 2003.

Yannick Monnet , Marc Renaudin , and Régis Leveugle . Asynchronous circuits sensitivity to fault injection. In *Proceedings of the IEEE International On-Line Testing Symposium* , pages 121–126. IEEE Computer Society, July 2004.

Yannick Monnet , Marc Renaudin , and Régis Leveugle . Asynchronous circuits transient faults sensitivity evaluation. In *Proceedings of the Design Automation Conference* , pages 863–868. ACM, June 2005.

Yannick Monnet , Marc Renaudin , and Régis Leveugle . Hardening techniques against transient faults for asynchronous circuits. In *Proceedings of the IEEE International On-Line Testing Symposium* , pages 129–134. IEEE Computer Society, July 2005.

Yannick Monnet , Marc Renaudin , and Régis Leveugle . Designing resistant circuits against malicious faults injection using asynchronous logic. IEEE Transactions on Computers, 55(9):1104–1115, 2006.

Fernando Moraes , Ney Calazans , Aline Mello , Leandro Möller , and Luciano Ost . HERMES: An infrastructure for low area overhead packet-switching networks on chip. Integration, the VLSI Journal, 38(1):69–93, October 2004.

Mahdi Mosaffa , Fataneh Jafari , and Siamak Mohammadi . Designing robust threshold gates against soft errors. Microelectronics Journal, 42(11):1276–1289, 2011.

Shubu Mukherjee . Architecture Design for Soft Errors. Morgan Kaufmann Publishers, Amsterdam, Boston, 2008.

David E. Muller and W. Scott Bartky . A theory of asynchronous circuits. In *Proceedings of the Annals of Computing Laboratory of Harvard University* , pages 204–243, 1959.

Robert Mullins and Simon Moore . Demystifying data-driven and pausible clocking schemes. In *Proceedings of the IEEE International Symposium on Asynchronous Circuits and Systems*, pages 175–185. IEEE Computer Society, March 2007.

Robert Mullins , Andrew West , and Simon Moore . Low-latency virtual-channel routers for on-chip networks. In *Proceedings of the Annual International Symposium on Computer Architecture* , volume 0, pages 188–197. IEEE Computer Society, 2004.

Robert Mullins , Andrew West , and Simon Moore . The design and implementation of a lowlatency on-chip network. In *Proceedings of the Asia and South Pacific Design Automation Conference* , pages 164–169. IEEE, January 2006.

Tadao Murata . Petri nets: Properties, analysis and applications. Proceedings of the IEEE, 77(4):541–580, 1989.

Jens Muttersbach , Thomas Villiger , and Wolfgang Fichtner . Practical design of globallyasynchronous locally-synchronous systems. In *Proceedings of the International Symposium on Advanced Research in Asynchronous Circuits and Systems* , pages 52–59. IEEE Computer Society, April 2000.

Javier Navaridas , Mikel Luján , José Miguel-Alonso , Luis A. Plana , and Steve B. Furber . Understanding the interconnection network of SpiNNaker. In *Proceedings of the International Conference on Supercomputing* , pages 286–295. ACM, June 2009.

Ted Nesson and S. Lennart Johnsson . ROMM routing on mesh and torus networks. In *Proceedings of the Annual ACM Symposium on Parallel Algorithms and Architectures*, pages 275–287. ACM Press, 1995.

Chrysostomos A. Nicopoulos, Dongkook Park, Jongman Kim, Narayanan Vijaykrishnan, Mazin S. Yousif, and Chita R. Das. ViChaR: A dynamic virtual channel regulator for network-on-chip routers. In *Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture*, pages 333–346. IEEE Computer Society, December 2006.

Arthur Nieuwoudt , Jamil Kawa , and Yehia Massoud . Crosstalk-induced delay, noise, and interconnect planarization implications of fill metal in nanoscale process technology. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 18(3):378–391, March 2010. Jinhyun Noh , Vincent Correas , Soonyoung Lee , Jongsung Jeon , Issam Nofal , Jacques Cerba , Hafnaoui Belhaddad , Dan Alexandrescu , YoungKeun Lee , and Steve Kwon . Study of neutron soft error rate (SER) sensitivity: Investigation of upset mechanisms by comparative simulation of FinFET and planar MOSFET SRAMs. IEEE Transactions on Nuclear Science, 62(4):1642–1649, August 2015.

Steven M. Nowick . Design of a low-latency asynchronous adder using speculative completion. IEE Proceedings — Computers and Digital Techniques, 143(5):301–307,

September 1996.

José L. Núñez-Yáñez , Doug A. Edwards , and Antonio Marcello Coppola . Adaptive routing strategies for fault-tolerant on-chip networks in dynamically reconfigurable systems. IET Computers & Digital Techniques, 2(3):184–198, May 2008.

Simon Ogg , Bashir M. Al-Hashimi , and Alexandre Yakovlev . Asynchronous transient resilient links for NoC. In *Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis* , pages 209–214, 2008.

Ümit Y. Ogras , Jingcao Hu , and Radu Marculescu . Key research problems in NoC design: A holistic perspective. In *Proceedings of the IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis* , pages 69–74. ACM, September 2005. Eiji Oki , Zhigang Jing , Roberto Rojas-Cessa , and H. Jonathan Chao . Concurrent roundrobin-based dispatching schemes for Clos-network switches. IEEE/ACM Transactions on Networking, 10(6):830–844, December 2002.

Eiji Oki , Nattapong Kitsuwan , and Roberto Rojas-Cessa . Analysis of space-space-space Clos-network packet switch. In *Proceedings of the Internatonal Conference on Computer Communications and Networks* , pages 1–6, August 2009.

J. Olsen , P. E. Becher , P. B. Fynbo , P. Raaby , and J. Schultz . Neutron-induced single event upsets in static RAMS observed a 10 km flight attitude. IEEE Transactions on Nuclear Science, 40(2):74–77, 1993.

Carlos Ortega , Jonathan Tse , and Rajit Manohar . Static power reduction techniques for asynchronous circuits. In *Proceedings of the IEEE International Symposium on Asynchronous Circuits and Systems* , pages 52–61. IEEE, May 2010.

Recep O. Ozdag and Peter A. Beerel . High-speed QDI asynchronous pipelines. In *Proceedings of the IEEE International Symposium on Asynchronous Circuits and Systems*, pages 13–22. IEEE Computer Society, April 2002.

Jiwoo Pak , Bei Yu , and David Z. Pan . Electromigration-aware redundant via insertion. In *Proceedings of the Asia and South Pacific Design Automation Conference* , pages 544–549, January 2015.

Thomas Panhofer , Werner Friesenbichler , and Andreas Steininger . Reliability estimation and experimental results of a self-healing asynchronous circuit: A case study. In *Proceedings of the NASA/ESA Conference on Adaptive Hardware and Systems* , pages 91–98. IEEE Computer Society, June 2010.

Giorgos Passas , Manolis Katevenis , and Dionisios N. Pnevmatikatos . A 128 × 128 × 24Gb/s crossbar interconnecting 128 tiles in a single hop and occupying 6% of their area. In *Proceedings of the ACM/IEEE International Symposium on Networks-on-Chip* , pages 87–95. IEEE Computer Society, May 2010.

Suhas S. Patil . Forward acting n x m arbiter. Technical Report 67, Massachusetts Institute of Technology, April 1972. http://csg.csail.mit.edu/CSGArchives/memos/Memo-67.pdf. Robert Pawlowski , Joseph Crop , Minki Cho , James W. Tschanz , Vivek De , Thomas Fairbanks , Heather Quinn , Shekhar Y. Borkar , and Patrick Yin Chiang . Characterization of radiation-induced SRAM and logic soft errors from 0.33V to 1.0V in 65nm CMOS. In *Proceedings of the IEEE Custom Integrated Circuits Conference* , pages 1–4. IEEE, September 2014.

Li-Shiuan Peh and William J. Dally . A delay model and speculative architecture for pipelined routers. In *Proceedings of the International Symposium on High-Performance Computer Architecture*, pages 255–266. IEEE Computer Society, January 2001.

Song Peng . *Implementing self-healing behavior in Quasi Delay-Insensitive circuits*. PhD thesis, Cornell University, August 2006.

Song Peng and Rajit Manohar . Efficient failure detection in pipelined asynchronous circuits. In *Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT)*, pages 484–493. IEEE Computer Society, October 2005.

Song Peng and Rajit Manohar . Self-healing asynchronous arrays. In *Proceedings of the IEEE International Symposium on Asynchronous Circuits and Systems*, pages 34–45, March 2006.

James Lyle Peterson . Petri Net Theory and the Modeling of Systems. Prentice Hall, USA, 1981.

Stanislaw J. Piestrak and Takashi Nanya . Towards totally self-checking delay-insensitive systems. In *Proceedings of the International Symposium on Fault-Tolerant Computing* , pages 228–237. IEEE Computer Society, June 1995.

Matthew Pirretti , Greg M. Link , Richard R. Brooks , Narayanan Vijaykrishnan , Mahmut Kandemir , and Mary Jane Irwin . Fault tolerant algorithms for network-on-chip interconnect. In *Proceedings of the IEEE Computer Society Annual Symposium on VLSI* , pages 46–51. IEEE Computer Society, February 2004.

Luis A. Plana , John Bainbridge , Steve Furber , Sean Salisbury , Yebin Shi , and Jian Wu . An on-chip and inter-chip communications network for the SpiNNaker massively-parallel neural net simulator. In *Proceedings of the International Symposium on Networks-on-Chips* , pages 215–216. IEEE Computer Society, April 2008.

Luis A. Plana , David M. Clark , Simon Davidson , Steve B. Furber , Jim D. Garside , Eustace Painkras , Jeffrey Pepper , Steve Temple , and John Bainbridge . SpiNNaker: Design and implementation of a GALS multicore system-on-chip. ACM Journal on Emerging Technologies in Computing Systems, 7(4):17:1–17:18, December 2011.

Luis A. Plana , Steve B. Furber , Steve Temple , Mukaram Khan , Yebin Shi , Jian Wu , and Shufan Yang . A GALS infrastructure for a massively parallel multiprocessor. IEEE Design & Test of Computers, 24(5):454–463, 2007.

Julian José Hilgemberg Pontes . *Soft Error Mitigation in Asynchronous Networks on Chip.* PhD thesis, Pontifícia Universidade Católica do Rio Grande do Sul, 2012. http://hdl.handle.net/10923/1559.

Julian José Hilgemberg Pontes , Ney Calazans , and Pascal Vivet . Adding temporal redundancy to delay insensitive codes to mitigate single event effects. In *Proceedings of the IEEE International Symposium on Asynchronous Circuits and Systems* , pages 142–149. IEEE Computer Society, May 2012.

Julian José Hilgemberg Pontes , Matheus T. Moreira , Fernando Moraes , and Ney Calazans . Hermes-A — an asynchronous NoC router with distributed routing. In *Proceedings of the International Workshop on Integrated Circuit and System Design: Power and Timing Modeling, Optimization, and Simulation* , pages 150–159. Springer, September 2011. Martin Radetzki , Chaochao Feng , Xueqian Zhao , and Axel Jantsch . Methods for fault tolerance in Networks-on-Chip. ACM Computing Surveys, 46(1):8:1–8:38, July 2013. David A. Rennels and Hyeongil Kim . Concurrent error detection in self-timed VLSI. In *Proceedings of the International Symposium on Fault-Tolerant Computing* , pages 96–105. IEEE Computer Society, June 1994.

Phillip J. Restle , K. A. Jenkins , A. Deutsch , and P. W. Cook . Measurement and modeling of on-chip transmission line effects in a 400 MHz microprocessor. IEEE Journal of Solid-State Circuits, 33(4):662–665, April 1998.

Samuel Rodrigo , Jose Flich , Antoni Roca , Simone Medardoni , Davide Bertozzi , Jesús Camacho Villanueva , Federico Silla , and José Duato . Addressing manufacturing challenges with cost-efficient fault tolerant routing. In *Proceedings of the ACM/IEEE International Symposium on Networks-on-Chip* , pages 25–32. IEEE Computer Society, May 2010.

Roberto Rojas-Cessa and Chuan-Bi Lin . Scalable two-stage Clos-network switch and module-first matching. In *Proceedings of the Workshop on High Performance Switching and Routing*, pages 303–308, June 2006.

Thomas L. Saaty . Elements of Queueing Theory: With Applications. McGraw-Hill New York, 1961.

Yoichi Sasaki , Kazuteru Namba , and Hideo Ito . Soft error masking circuit and latch using Schmitt trigger circuit. In *Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT)* , pages 327–335. IEEE Computer Society, October 2006.

Steve Scott , Dennis Abts , John Kim , and William J. Dally . The BlackWidow high-radix clos network. In *Proceedings of the International Symposium on Computer Architecture* , pages 16–28. ACM, May 2006.

Norbert Seifert and Nelson Tam . Timing vulnerability factors of sequentials. IEEE Transactions on Device and Materials Reliability, 4(3):516–522, September 2004.

Maitham Shams , Jo C. Ebergen , and Mohamed I. Elmasry . A comparison of CMOS implementations of an asynchronous circuits primitive: The C-element. In *Proceedings of the International Symposium on Low Power Electronics and Design* , pages 93–96. IEEE, August 1996.

Delong Shang , Fei Xia , Stanislavs Golubcovs , and Alex Yakovlev . The magic rule of tiles: Virtual delay insensitivity. In *Proceedings of the International Workshop on Power and Timing Modeling, Optimization and Simulation* , pages 286–296. Springer Berlin Heidelberg, 2009. Abbas Sheibanyrad . *Asynchronous Implementation of a Distributed Network-on-Chip.* PhD thesis, University of Pierre et Marie Curie, 2008.

Abbas Sheibanyrad and Alain Greiner . Two efficient synchronous  $\leftrightarrow$  asynchronous converters well-suited for networks-on-chip in GALS architectures. Integration, the VLSI Journal, 41(1):17–26, January 2008.

Abbas Sheibanyrad , Alain Greiner , and Ivan Miro-Panades . Multisynchronous and fully asynchronous NoCs for GALS architectures. IEEE Design & Test of Computers, 25(6):572–580, November 2008.

Kenneth L. Shepard and Vinod Narayanan . Noise in deep submicron digital design. In *Proceedings of the IEEE/ACM International Conference on Computer-Aided Design*, pages 524–531. IEEE Computer Society/ACM, November 1996.

Yebin Shi . *Fault-Tolerant Delay-Insensitive Communication*. PhD thesis, University of Manchester, 2010. http://apt.cs.manchester.ac.uk/publications/thesis/YShi10\_phd.php. Yebin Shi , Steve B. Furber , Jim Garside , and Luis A. Plana . Fault tolerant delay insensitive inter-chip communication. In *Proceedings of the IEEE International Symposium on Asynchronous Circuits and Systems* , pages 77–84. IEEE, May 2009.

Keun Sup Shim , Myong Hyon Cho , Michel Kinsy , Tina Wen , Mieszko Lis , G. Edward Suh , and Srinivas Devadas . Static virtual channel allocation in oblivious routing. In *Proceedings of the ACM/IEEE International Symposium on Networks-on-Chip* , pages 38–43. IEEE Computer Society, 2009.

Montek Singh and Steven M. Nowick . MOUSETRAP: Ultra-high-speed transition-signaling asynchronouspipelines. In *Proceedings of the International Conference on Computer Design* , pages 9–17, September 2001.

Montek Singh and Steven M. Nowick . The design of high-performance dynamic asynchronous pipelines: Lookahead style. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 15(11):1256–1269, November 2007.

Charles Slayman . Soft error trends and mitigation techniques in memory devices. In *Proceedings of the Annual Reliability and Maintainability Symposium*, pages 1–5. IEEE, January 2011.

Jared C. Smolens , Brian T. Gold , James C. Hoe , Babak Falsafi , and Ken Mai . Detecting emerging wearout faults. In *Proceedings of the IEEE Workshop on Silicon Errors in Logic — System Effects* , April 2007.

Wei Song . *Spatial Parallelism in the Routers of Asynchronous On-Chip Networks*. PhD thesis, University of Manchester, 2011. https://www.escholar.manchester.ac.uk/uk-ac-man-scw:126142.

Wei Song and Doug Edwards . An asynchronous routing algorithm for Clos networks. In *Proceedings of the International Conference on Application of Concurrency to System Design*, pages 67–76. IEEE Computer Society, June 2010.

Wei Song and Doug Edwards . A low latency wormhole router for asynchronous on-chip networks. In *Proceedings of the Asia South Pacific Design Automation Conference*, pages 437–443. IEEE, January 2010.

Wei Song and Doug Edwards . Asynchronous spatial division multiplexing router. Microprocessors and Microsystems, 35(2):85–97, 2011.

Wei Song , Doug Edwards , Jim Garside , and William J. Bainbridge . Area efficient asynchronous SDM routers using 2-stage Clos switches. In *Proceedings of the Design, Automation & Test in Europe Conference & Exhibition* , pages 1495–1500. IEEE, March 2012.

Wei Song , Doug A. Edwards , José Luis Núñez-Yáñez , and Sohini Dasgupta . Adaptive stochastic routing in fault-tolerant on-chip networks. In *Proceedings of the International Symposium on Networks-on-Chips* , pages 32–37. IEEE Computer Society, May 2009.

Daniel J. Sorin . Fault Tolerant Computer Architecture. Morgan & Claypool Publishers, January 2009.

Jens Sparsø . Asynchronous design of networks-on-chip. In *Proceedings of the Norchip* . IEEE, November 2007.

Jens Sparsø and Steve Furber . Principles of Asynchronous Circuit Design — A Systems Perspective. Kluwer Academic Publishers, Boston, U.S.A, 2001.

Jens Sparsø and Jørgen Staunstrup . Delay-insensitive multi-ring structures. Integration, the VLSI Journal, 15(3):313–340, 1993.

Jayanth Srinivasan , Sarita V. Adve , Pradip Bose , and Jude A. Rivers . The impact of technology scaling on lifetime reliability. In *Proceedings of the International Conference on Dependable Systems and Networks* , pages 177–186. IEEE Computer Society, June 2004. Mikkel Bystrup Stensgaard and Jens Sparsø . ReNoC: A network-on-chip architecture with reconfigurable topology. In *Proceedings of the ACM/IEEE International Symposium on Networks-on-Chip* , pages 55–64. IEEE, April 2008.

Edward A. Stott , N. Pete Sedcole , and Peter Y. K. Cheung . Fault tolerant methods for reliability in FPGAs. In *Proceedings of the International Conference on Field Programmable Logic and Applications* , pages 415–420. IEEE, September 2008.

Ivan E. Sutherland . Micropipelines. Communications of the ACM, 32(6):720–738, 1989. Ivan E. Sutherland and Scott Fairbanks . GasP: A minimal FIFO control. In *Proceedings of the IEEE International Symposium on Asynchronous Circuits and Systems*, pages 46–53. IEEE Computer Society, March 2001.

A. Taber and E. Normand . Single event upset in avionics. IEEE Transactions on Nuclear Science, 40(2):120–126, 1993.

Mehdi Baradaran Tahoori and Subhasish Mitra . Defect and fault tolerance of reconfigurable molecular computing. In *Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines* , pages 176–185. IEEE Computer Society, April 2004.

E. Takeda and N. Suzuki . An empirical model for device degradation due to hot-carrier injection. IEEE Electron Device Letters, 4(4):111–113, April 1983.

Andrew S. Tanenbaum and David J. Wetherall . Computer Networks. Prentice Hall, 5 edition, January 2010.

The Advanced Processor Technologies Research Group , School of Computer Science at the University of Manchester. Spinnaker — a universal spiking neural network architecture. https://apt.cs.manchester.ac.uk/projects/SpiNNaker/.

Yvain Thonnart , Edith Beigné , and Pascal Vivet . Design and implementation of a GALS adapter for ANoC based architectures. In *Proceedings of the IEEE Symposium on Asynchronous Circuits and Systems* , pages 13–22, May 2009.

Yvain Thonnart , Pascal Vivet , and Fabien Clermidy . A fully-asynchronous low-power framework for GALS NoC integration. In *Proceedings of the Design, Automation & Test in Europe Conference & Exhibition* , pages 33–38. IEEE Computer Society, 2010.

Xuan-Tu Tran , Yvain Thonnart , Jean Durupt , Vincent Beroulle , and Chantal Robach . Design-for-test approach of an asynchronous network-on-chip architecture and its associated test pattern generation and application. IET Computers Digital Techniques, 3(5):487–500, 2009.

Dean Nguyen Truong , Wayne H. Cheng , Tinoosh Mohsenin , Zhiyi Yu , Anthony T. Jacobson , Gouri Landge , Michael J. Meeuwsen , Christine Watnik , Anh Thien Tran , Zhibin Xiao , Eric W. Work , Jeremy W. Webb , Paul Vincent Mejia , and Bevan M. Baas . A 167-processor computational platform in 65 nm CMOS. IEEE Journal of Solid-State Circuits, 44(4):1130–1144, April 2009.

King-Ning Tu . Recent advances on electromigration in very-large-scale-integration of interconnects. Journal of Applied Physics, 94(9):5451–5473, 2003.

Leslie G. Valiant and Gordon J. Brebner . Universal schemes for parallel communication. In *Proceedings of the Annual ACM Symposium on Theory of Computing* , pages 263–277. ACM, May 1981.

Sriram R. Vangal , Jason Howard , Gregory Ruhl , Saurabh Dighe , Howard Wilson , James W. Tschanz , David Finan , Arvind P. Singh , Tiju Jacob , Shailendra Jain , Vasantha Erraguntla , Clark Roberts , Yatin Hoskote , Nitin Borkar , and Shekhar Borkar . An 80-tile sub-100-w TeraFLOPS processor in 65-nm CMOS. IEEE Journal of Solid-State Circuits,

43(1):29–41, January 2008.

Thomas Verdel and Yiorgos Makris . Duplication-based concurrent error detection in asynchronous circuits: Shortcomings and remedies. In *Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems*, pages 345–353, November 2002.

Tom Verhoeff . Delay-insensitive codes — an overview. Distributed Computing, 3(1):1–8, March 1988.

Daniel Wiklund and Dake Liu . SoCBUS: Switched network on chip for hard real time embedded systems. In *Proceedings of the International Parallel and Distributed Processing Symposium*. IEEE Computer Society, April 2003.

Wayne Wolf , Ahmed Amine Jerraya , and Grant Martin . Multiprocessor system-on-chip (MPSoC) technology. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 27(10):1701–1713, October 2008.

Pascal T. Wolkotte , Gerard J.M. Smit , Gerard K. Rauwerda , and Lodewijk T. Smit . An energy-efficient reconfigurable circuit-switched network-on-chip. In *Proceedings of the IEEE International Parallel and Distributed Processing Symposium* . IEEE, April 2005.

Jian Wu and Steve Furber . A multicast routing scheme for a universal spiking neural network architecture. The Computer Journal, 53(3):280–288, 2010.

Jing-ling Yang , Oliver Chiu-sing Choy , Cheong-fat Chan , and Kong-pong Pun . Design for self-checking and self-timed datapath. In *Proceedings of the IEEE VLSI Test Symposium* , pages 417–422. IEEE Computer Society, April 2003.

Jian Yao , Zuochang Ye , Miao Li , Yanfeng Li , R. D. Schrimpf , D. M. Fleetwood , and Yan Wang . Statistical analysis of soft error rate in digital logic design including process variations. IEEE Transactions on Nuclear Science, 59(6):2811–2817, December 2012. Tomohiro Yoneda , Masashi Imai , Naoya Onizawa , Atsushi Matsumoto , and Takahiro

Hanyu . Multi-chip NoCs for automotive applications. In *Proceedings of the IEEE Pacific Rim International Symposium on Dependable Computing*, pages 105–110, November 2012. Qiaoyan Yu . *Transient and permanent error management for Networks-on-Chip.* PhD thesis, University of Rochester, 2011. http://hdl.handle.net/1802/14810.

Qiaoyan Yu and Paul Ampadu . Transient and permanent error co-management method for reliable networks-on-chip. In *Proceedings of the ACM/IEEE International Symposium on Networks-on-Chip* , pages 145–154. IEEE Computer Society, May 2010.

Qiaoyan Yu and Paul Ampadu . Dual-layer adaptive error control for network-on-chip links. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 20(7):1304–1317, 2012. Kenneth Y. Yun , Peter A. Beerel , and Julio Arceo . High-performance asynchronous pipeline circuits. In *Proceedings of the IEEE International Symposium on Asynchronous Circuits and Systems* , pages 17–28. IEEE Computer Society, 1996.

Guangda Zhang , Wei Song , Jim D. Garside , Javier Navaridas , and Zhiying Wang . Transient fault tolerant QDI interconnects using redundant check code. In *Proceedings of the Euromicro Conference on Digital System Design* , pages 3–10, September 2013.

Guangda Zhang , Wei Song , Jim D. Garside , Javier Navaridas , and Zhiying Wang . Protecting QDI interconnects from transient faults using delay-insensitive redundant check codes. Microprocessors and Microsystems, 38(8, Part A):826–842, 2014.

Meilin Zhang , Qiaoyan Yu , and Paul Ampadu . Fine-grained splitting methods to address permanent errors in network-on-chip links. In *Proceedings of the IEEE International Symposium on Circuits and Systems* , pages 2717–2720, May 2012.

Ying Zhang , Zebo Peng , Jianhui Jiang , Huawei Li , and Masahiro Fujita . Temperatureaware software-based self-testing for delay faults. In *Proceedings of the Design, Automation* & Test in Europe Conference & Exhibition , pages 423–428, March 2015.

Zhen Zhang , Alain Greiner , and Sami Taktak . A reconfigurable routing algorithm for a faulttolerant 2D-mesh network-on-chip. In *Proceedings of the Design Automation Conference* , pages 441–446. ACM, June 2008.