You are on page 1of 12

Smart Computing Review, vol. 3, no.

6, December 2013

425

Smart Computing Review

A Tutorial for Key Problems


in the Design of Hybrid
Hierarchical NoC
Architectures with
Wireless/RF
Chunhua Xiao , Zhangqin Huang, and Da Li
Embedded Software and System Institution, Beijing University of Technology / 100022, Beijing, CHINA /
xiaochh@emails.bjut.edu.cn
* Corresponding Author: Chunhua Xiao

Received August 15, 2013; Revised October 31, 2013; Accepted November 8, 2013; Published December 19,
2013

Abstract: As processing nodes scale up, it is difficult for traditional electronic networks to supply
on-chip communication efficiently due to unacceptable latency, plus power and area consumption.
Alternative interconnects, such as radio frequency interconnect (RF-I) and optical interconnect,
have been explored as interconnection backbones. Hybrid hierarchical architectures with both
traditional interconnects and emerging interconnects have been widely adopted to get excellent
trade-off between latency and power. The hybrid hierarchical architecture with a wireless/RF-I
backbone is more cost-efficient and feasible due to advantages in complementary metal oxide
semiconductor compatibility, compared with other alternative interconnects, and has become one of
the mainstreams of chip multi-processor systems. However, how to efficiently utilize the
wireless/RF-I backbone is a new challenge for designers. Based on analysis of existing typical
hybrid hierarchal wireless/RF-I architectures (HHWAs), the key problems in the Design of
HHWAs are proposed here, and related potential solutions are provided. In particular, strategies for
resource management of wireless/RF-I are explored in detail, and different solutions are discussed.
This work is expected to serve as a basis for future HHWA designs.
Keywords: Network-on-chip, radio frequency interconnect, wireless interconnect

This research was supported by the Beijing Municipal Natural Science Foundation (No.4122010, 2012.1 - 2014.12).
DOI: 10.6029/smartcr.2013.06.004

426

Xiao et al.: A Tutorial for Key Problems in the Design of Hybrid Hierarchical NoC Architectures with Wireless/RF

Introduction

s we enter the era of multiple cores and beyond, the number of cores, coprocessors, and on-chip accelerators grows
rapidly. The dramatic increase of these processing elements (PEs) imposes a tremendous challenge for on-chip
communication that demand high performance, including lower latency and higher bandwidth, but also minimal
performance per energy/area. According to the International Technology Roadmap for Semiconductors (ITRS) [1],
improving characteristics of metal wires will no longer satisfy performance requirements, and new interconnect paradigms
are needed. Different revolutionary approaches, such as optical interconnect [2][3], radio frequency interconnect (RF-I)
[4][5][6], and wireless interconnect with complementary metal oxide semiconductor (CMOS) ultra wide band (UWB)
technology [7][8], have been explored. But these emerging interconnects have associated antenna and transceiver area,
extra integrated components and power overheads, and thus need to be placed and used optimally to achieve the best
performance without undue overhead [9][10]. Although the traditional planar metal interconnects suffer from limitations
arising from multi-hop communication, which result in high latency and power consumption, they are still highly effective
and suitable for short distances. The vast improvements in CMOS technology have led to wires with only 0.18 pJ/bit of
energy consumption at 1 mm for a 32 nm technology design [11]. Based on these reasons or technology problems, many
researchers adopted hybrid hierarchical wireless/RF-I architectures (HHWAs) to get excellent trade-offs between latency
and power with limited extra cost [12][13][14][15][16]. HHWA is characterized by local traditional wired interconnection
and global wireless/RF-I interconnection, and provides some unique benefits including the following: (1) Instead of multihop in traditional interconnection, wireless/RF-I implements one hop for long distance communication, which alleviates
power consumption while providing high bandwidth and low latency without excessive overhead. (2) Taking full advantage
of traditional networks on a chip (NoCs) and emerging interconnects, HHWA employs their respective merits. (3)
Compared with optical interconnects in hybrid architectures, using wireless/RF-I as a global communication backbone
attains better feasibility and cost-efficiency due to an advantage in CMOS compatibility.
As an architecture composites emerging technologies and traditional interconnects, new design challenges arise that
might be bottlenecks to performance improvement. This work explores the key problems in HHWA designs and provides
related potential solutions, which is expected to serve as basis from which to work towards future HHWA design. The rest
of the paper is organized as follows. In Section 2, we provide a brief overview of the new alternative interconnect
technologies (wireless and RF-I) and how they can be leveraged for on-chip communication. Based on the availability of
these two interconnect technologies, we discuss the topology of HHWAs and explore the existing typical HHWAs in
Section 3. Due to importance of wireless/RF-I resource management in HWWAs, we did an in-depth survey and analyze
the resource arbitration mechanisms in Section 4. In Section 5, we summarize the key problems in HHWA design and
provide related feasible solutions. Finally, we conclude our work in Section 6.

RF-I/Wireless
RF-I
Radio frequency interconnect has been proposed as a high-aggregate bandwidth, low-latency alternative to traditional
interconnect [4][5][19]. Its benefits have been demonstrated for off-chip, on-board communication, as well as for on-chip
interconnection networks [20][21][22].
Unlike conventional metallic wires that require charging and discharging the whole wire to signify either 0 or 1,
RF-I modulates information on an electromagnetic carrier wave that is continuously sent along the transmission line
(Figure 1). RF-I has been projected to scale better than traditional RC wires in terms of delay and power consumption; it
can allow signal transmission across a 400 mm2 die in 0.3 ns via propagation at the effective speed of light [5] as opposed
to less than, or equal to, 4 ns on a repeated bus.
Instead of trying to aggressively expand baseband bandwidth (which often involves power-hungry compensation
techniques to achieve a flat channel frequency response), RF-I divides bandwidth into frequency domains, each becoming a
narrow-band signal, which saves power. By doing this, RF-I also improves bandwidth efficiency by sending many
simultaneous streams of data over a single transmission line. This particular technique is referred to as multi-band RF-I [6].
As shown in the Figure 2, there are N mixers on the transmitting (or Tx) side in multi-band RF-I, where N is the number of
senders sharing the transmission line. Each mixer up-converts individual data streams into a specific channel (or frequency
band). On the receiver (Rx) side, N additional mixers are employed to down-convert each signal back to the original data
and N low-pass-filters (LPF) are used to isolate the data from residual high-frequency components. Based on shortcut
selection, each transmitter or receiver in the topology will be tuned to a particular frequency (or disabled entirely) to
implement our shortcuts [5][6].

Smart Computing Review, vol. 3, no. 6, December 2013

427

C Core

$ L2 cache bank
RF-I transmission line

M Off-chip Memory controller


RF-I node

Router

Figure 1. RF-I transmission line in a chip multiprocessor system

Figure 2. A ten-carrier RF-I and corresponding waveform at the transmission line

Wireless
Different from RF-I, the transmission channel does not need to be physically laid out for wireless interconnection, and the
communication medium is free space [23]. Wireless communication can be over different frequency ranges, from several
gigahertzes to thousands of gigahertz [24].
An on-chip antenna is always one of the most difficult, but very important, components that can be integrated on-chip
for HHWAs, because passive devices such as inductors consume the dominant portion of the transceiver area. Fortunately,
as CMOS technology improves, not only the size but also the cost of the antenna and required circuits will decrease
dramatically, which provides the feasibility for integrating multiple on-chip antennas [12]. An example of the necessary
components of wireless transceivers for millimeter wave (mm-wave) links in a chip multiprocessor system is shown in
Figure 3. A metal zigzag antenna was demonstrated to support wireless network-on-a-chip (WiNoC) [25] and was used to
design an mm-wave wireless NoC by Deb et al. [26]. As the transmission frequency increased to the terahertz range, carbon
nanotubes (CNTs) were explored for the on-chip antenna [27], and the feasibility of designing a WiNoC was demonstrated
by Ganguly et al. [15]. Compared with RF-I, which needs the transmission line to span the entire chip area, communication
routing is not limited by the physical channel for wireless interconnection. However, wireless interconnection faces
interference challenges and cost problems, which are proportional to the communication distance.

428

Xiao et al.: A Tutorial for Key Problems in the Design of Hybrid Hierarchical NoC Architectures with Wireless/RF

Cluster 0: C0
C0

Antenna

C0

C1

Receiver
Side

Transmitter
Side

C1

Swith

Driver
Amplifier

C0

C1

LNA

C1

C3

C3

C2

C2

C3

C3

C2

C2

Data to be
transmitted

Modulator

Carrier
Frequency

Serializer

Demodulator

Deserializer

Data Received

C0

Figure 3. An example of mm-wave links in a chip multiprocessor system

Hybrid Hierarchical Wireless/RF-I Architectures


Topology
Topology defines how channels and routers are connected in an interconnection network and determines the performance
boundsincluding zero-load latency and network throughput [17]. As showed in Figure 4, A hybrid hierarchical
wireless/RF-I network consists of two types of network: a local network, which uses traditional wire interconnects, and a
global/express network, which uses wireless/RF-I. For a conventional NoC, there can be various topologies for a local
network, such as mesh, centralized mesh, ring, star, etc. Each local network forms a subnet and is equipped with a
wireless/RF-I access point (WAP). As long as the antennas are placed within communication range (or the RF-I is enabled
between them), only a single hop is needed for inter-subnet communication. All WAPs from all subnets are connected as a
second-level network forming the global/express network. This upper level of the hierarchy can have various designs with
different characteristics to achieve the full benefit of on-chip express networks.
An important problem when creating an efficient global/wireless network is the placement of WAPs, which will greatly
influence the trade-off between system performance and cost. If each PE is equipped with a WAP (each local network only
consists of one node) and can communicate with any other node through the express wireless/RF-I, we can get the best
system performance with low latency and high throughput. But the area cost may be unpalatable due to the equipment
(antennas, transceivers, etc.). If too many PEs share a WAP, or if the WAP is placed improperly, performance improvement
would be offset by induced overhead. Ganguly et al. induced small world theory to create an HHWA, and inserted wireless
links through a simulated annealingbased algorithm to minimize the average distance (measured by the number of hops)
between all source and destination hubs [15]. Chang et al. used RF-I as an express shortcut between intensively
communicated nodes with communication profiling of the application to accelerate and optimize region-to-region
communication. They placed the RF-enabled routers in a staggered fashion to minimize the distance any given component
would need to travel to reach the RF-I [16]. Different from related works, Lee [12] and Di Tomaso et al. [13] placed the
WAPs at the center of concentrated mesh-based clusters to provide distributed wireless express pathways for inter-cluster,
long-haul communication to support hundreds of PEs.

Existing Typical Architectures


Chang et al. [16] exploited dynamic RF-I bandwidth allocation to realize a reconfigurable hierarchical network-on-a-chip
architecture. As shown in Figure 5, this architecture uses a mesh topology as the baseline and places adaptive shortcuts as
an RF-overlaid topology to match different communication demands of the applications. This approach selects shortcuts
according the optimizing cost equation synthesized with application communication statistics. The selected shortcuts are
implemented through RF-I enabled routers (standard routers extending a port as an RF-I interface). Each transmitter or
receiver in the topology is tuned to a particular frequency (or disabled entirely) to offer a shortcut. To enable the new
available paths (RF-I shortcuts) and also reduce the reconfiguration cost, the routing tables in all network routers will be
updated before executing the application. A shortest path routing strategy is adopted with RF-I shortcuts to transmit packets.
This dynamic allocation approach enables reconfiguring the topology via frequency band reassignment, thereby providing
the benefits of adaptive routing without having to pay the cost of traversing extra channels [23].

Smart Computing Review, vol. 3, no. 6, December 2013

429

Global/Express
Network

Local
Network

Local
Network

Local
Network

Figure 4. Hybrid hierarchical wireless/RF-I network

Traditional Wired Link


Base Router

Shortcuts

Figure 5. An example of adaptive RF-I shortcuts in a chip multiprocessor system


Modern complex network theory provides powerful methods to analyze network topologies. The small-world theory [28]
is incorporated in HHWA to simultaneously address the latency, power consumption, and interconnect routing problems by
minimizing the hop counts in inter-core communication, and we denote these architectures as small-worldbased
architectures as shown in Figure 6 [15][18]. For a small-worldbased architecture, the whole system is divided into
multiple small clusters called subnets, and all PEs within each subnet are connected to a centrally located hub through
direct links. These hubs are connected to form a second-level hierarchical structure, or global network. Given the number of
wireless interfaces (WIs), the placement of WIs to these hubs is optimized through a simulated annealingbased algorithm.
The routing strategy adopted is a combination of dimension order routing for the hubs without WIs and a south-east routing
algorithm for the hubs with WIs. For inter-subnet communication, the routing path involving the wireless medium is chosen
if it reduces the total path length, compared to the wired path [18]. A token flow control strategy is adopted to alleviate the
potential hotspot problem in WIs, which occurs from the simultaneous multiple access requirements for the wireless links,
while another different token-passing protocol is used to avoid interference and contention for the wireless medium from a
particular hub at a given instant.
An example of a two-level WCube structure is shown in Figure7, which is a multi-level, two-dimensional structure to
interconnect hundreds to thousands of cores in chip multiprocessors [12]. Two types of routers are included in this network:
base routers that make up the baseline concentrated mesh, and wireless routers with wireless interfaces to form a wireless
backbone. Each wireless router is responsible for a cluster of n base routers, while each base router charges k PEs because
the k-way concentrated mesh is adopted. The wireless routers, base routers and PEs are assigned exclusive addresses in
WCube to identify their exact positions in the network, and the whole architecture can be recursively described. Every
wireless router is assigned a single, different frequency band and is equipped with one wireless transmitter and multiple
receivers to allow parallel transmission. WCube uses wormhole-based delivery and latency-oriented routing to minimize
communication latency. The wireless link is chosen if latency can obviously be reduced, compared with only using a

430

Xiao et al.: A Tutorial for Key Problems in the Design of Hybrid Hierarchical NoC Architectures with Wireless/RF

baseline. WCube offers scalable performance in terms of latency and connectivity, compared other HHWAs, and the
architecture has proven cost-efficient with 1024 nodes.

Processing
Node

Wireless links

Switch

Traditional Wired links

Hub

Figure 6. Small-worldbased hybrid hierarchical wireless architecture

Wcube 0
Wcube 1
Wcube 2

Core

Wireless Router

L2 Cache

Base Router

Figure 7. A two-level WCube structure with a cluster of 16 base routers (i.e. 64 nodes)
Different from WCube, which uses a centralized wireless hub at each group of 64 nodes, in the iWISE architecture,
every router has its own transmitter and receiver for each group of routers. As shown in Figure 8, the iWISE architecture
reduces the hop count by distributing these transceivers at each router, as opposed to the centralized hub found in WCube
[13]. A token scheme is adopted for the wireless routers to share the limited bandwidth, while frequency division
multiplexing (FDM) and time division multiplexing (TDM) are induced to avoid transmission interference.

Wireless/RF-I Resource Management


The wireless/RF access points act as the connective bridges in the hybrid hierarchical wireless/RF-I architecture, which
connects the local network and global network. If there are multiple packets trying to access the same wireless/RF node at
once, the wireless/RF access points might become bottlenecks, thus overloading the access points and resulting in higher
latency, so a reasonable control strategy is needed to alleviate the potential congestion between the multiple wireless/RF

Smart Computing Review, vol. 3, no. 6, December 2013

431

requirements for the access points. Similarly, another arbitration scheme is needed to decide who can get access to the
particular wireless medium (or RF-I channel) in a given period, because all wireless/RF-I access points can tune to the
same channel and can send or receive data from any other wireless/RF-I access point in the network. Therefore, how to
allocate the wireless/RF resource of the specific wireless/RF access point between multiple transmission requirements from
the PEs (or the base routers in the local network) and how to allocate the specific wireless medium or RF-I channel between
multiple wireless/RF-I points in a given period are two of the important problems in wireless/RF-I resource management.
The solutions to the two problems explored so far by different research groups can be broadly classified into three classes,
depending on the specific implementation of the HHWA.

Traditional
Wired Link
Router

Set 2

Set 3

Set 0

Set 0

Core
Wireless link

Figure 8. An iWISE architecture showing wireless communication between four sets


One is a fixed static allocation strategy with a coarse-gain arbitration mechanism, which assigns the wireless/RF-I to
predetermined communication pairs for the entire duration of an applications execution [6][16][12][29]. The chosen pairs
are allocated a specific wireless link (or RF channel), and each transmitter or receiver in the topology will be tuned to a
particular frequency; thus the specific bandwidth is exclusive to the transmitter, and contention is avoided [16]. Another
frequency band is extended to act as a multicast channel, with multiple receivers tuned to that frequency band to receive
multicast. A certain processing node is chosen as the only transmitter of the multicast channel, and other PEs that want to
send a multicast should first implicitly send the multicast message via conventional mesh links to the designated transmitter.
The destination bit vector (DBV) is used to distinguish multicast transmissions from other network communication. To
improve scalability and connectivity, Lee et al. [12] adopted wireless links instead of RF-I to support thousands of cores. A
single, different frequency band is assigned to every wireless router, which is exclusively used for transmission. Every
micro wireless router is equipped a single transmit antenna and multiple receive antennas, and the receivers are statically
tuned to the frequency bands of their logical neighbors (whose addresses differ from that router in only one bit) to
implement parallel transmission without frequency interference. However, this approach does not provide a congestion
control mechanism to alleviate the potential bottleneck if too many packets try to use the wireless backbone at once.
Another class adopts a token-based arbitration mechanism [30] to solve access contention for the wireless/RF-I resource
[13][15][18]. To address contention from multiple wireless requirements to transmit packets through the express pathway, a
token flow control along with a distributed routing strategy is adopted to alleviate congestion [18]. If taking the wireless
link for communication reduces the total hop count, and if the token of this wireless link to the destination is available, the
access transmission is allowed. To address contention between wireless routers for a specific wireless medium, a different
wireless token-passing protocol can be used [18]. The particular wireless router possessing the wireless token can broadcast
flits into the wireless medium, and the wireless token will be forwarded to the next wireless router after all flits belonging
to a packet at the current wireless token-holding router are transmitted. Different from other HHWAs that centralize the
wireless routers, iWISE distributes the transceivers at each router to avoid hotspots and reduce the hop count. In the iWISE
architecture, a sharing scheme with tokens is used to share the limited bandwidth, along with FDM and TDM mechanisms
to avoid interference. In this token-based arbitration scheme, possession of a token represents the right to transmit on a
certain frequency to a set [13]. Two different sharing schemes: token-partial and token-full, are explored with different
workloads, which demonstrated how the different design of token-based arbitration can influence arbitration cost (latency)
and channel utilization for different traffic patterns, so as to affect the communication performance.
Although the fixed static allocation strategy can dynamically and adaptively choose different shortcuts for different
applications, the shortcuts cannot be adjusted according to real-time workload requirements. Token-based dynamic

432

Xiao et al.: A Tutorial for Key Problems in the Design of Hybrid Hierarchical NoC Architectures with Wireless/RF

arbitration, which allocates the channels in real time to communicating pairs on demand with low arbitration latency, power,
and hardware cost, faces a channel utilization problem and long arbitration latency with non-uniform communication.
However, modern and future CMPs tend not to exhibit this uniformity due to spatial communication heterogeneity. So
stream arbitration was proposed by Xiao et al. [31] as an efficient dynamic bandwidth utilization scheme that can deal with
both spatial and temporal communication heterogeneity. Unlike token arbitration, where channels are coupled to receivers,
a channel in stream arbitration can be used to send packets from any sender to any receiver, which efficiently addresses the
problem of spatial communication heterogeneity. Since stream arbitration is inherently a dynamic arbitration scheme, it
also efficiently handles temporal communication heterogeneity. Stream arbitration partitions the aggregate bandwidth into
arbitration channels and data channels. Active sources (nodes that want to send flits through wireless/RF-I) compete for the
data channels in the arbitration channel in order to talk to their desired destination nodes. Stream arbitration is a distributed
mechanism without a centralized arbitrator and is implemented independently and simply. Stream arbitration proved to be
an efficient scheme for resource arbitration for emerging network technologies, with a case study consisting of a modeled
RF-I network.

Key Problems in HHWA Design


Wireless or RF-I?
As we know, both wireless and RF-I have better compatibility compared to other technologies, such as optical
interconnects, and perform well as an expressway for long and critical communication in an HHWA, compared to
traditional NoCs with only wired connects; but each has its own merits and characters. When we design an HHWA,
which emerging interconnects should we choose? Wireless or RF-I, or both? As we discussed in Section 2, the biggest
difference between RF-I and wireless is the transmission medium, for no channel needs to be physically laid out with
wireless interconnects, whereas a transmission line (TL) is needed for electromagnetic carrier wave transmission in RF-I.
So the area cost of RF-I will be a challenge for the design of very large scale integrated circuits since the long TL needs to
span the whole chip for remote transmission, and the crosstalk (or inter-channel interference) between adjacent TLs may
also pose problems for long TLs with very high frequencies [12]. Without a physical channel needed, wireless
interconnects provide better scalability and connectivity compared with RF-I. But the on-chip antenna is always one of the
most difficult components to be integrated for large CMPs [12][15]. In addition, due to the induced cost, wireless is not as
efficient with very short distance communication. A comparative analysis of the energy dissipation per bit between wireless
and wired communication channels was carried out by Chang et al. [18], which showed mm-wave wireless shortcuts are
always energy-efficient when the link length is 7 mm, but inefficient below 7 mm, compared to traditional wired links [24].
Why not employ their (wireless and RF-I) respective merits and complementary strengths? For mid-sized networks
within the range of tens to the low hundreds of PEs, we can adopt RF-I, which is more feasible for reducing latency and
energy consumption. For very large scale networks with thousands of cores, wireless interconnect can be adopted to
provide better scalability. An alternative approach is a combination of wireless links and RF-I, which uses RF-I to bridge
the gap between the baseline mesh and wireless interconnect for midrange messages, using wireless interconnects only for
long-range communication [12]. This hierarchical architecture with three levels provides better trade-off between cost and
performance, but the design of relay nodes for inter-level transmission might be a problem, which should be explored in
depth to minimize the extra cost and potential bottlenecks.

Placement of the wireless/RF-I access points


The placement of wireless/RF-I access points is crucial for optimum performance gain because it establishes high-speed,
low-energy interconnects on the network. The aim is to minimize the number of cycles between distant or critical endpoints
so as to get the optimal architecture design with minimal average latency or hop count. The existing optimization
techniques, such as evolutionary algorithms (EAs) [32], coevolutionary algorithms [33] and the simulated annealing (SA)
algorithm [34], afford us powerful methods to help with architecture construction. The choice of optimization algorithm is a
trade-off between better results and faster speed for a large search space. EAs are generally believed to give better results
but lengthy times. SA reaches comparably good solutions with acceptable search time [34][18]. No matter which heuristics
is adopted, a cost metric is needed for optimization evaluation, which includes the distance (in hops) and the probability of
communication between sources and destinations. It is a good approach to introduce application communication statistics
into the cost metric to find the optimum position for the placement of wireless/RF-I access points, so as to accelerate
communication on paths that are most frequently used by the application [16].

Routing

Smart Computing Review, vol. 3, no. 6, December 2013

433

The routing strategy determines the path a packet takes from its source to its destination. Due to the different transmission
characteristics of RF-I/wireless compared with traditional wired interconnects, and the harsh requirements for on-chip
design of a hierarchical architecture, the routing mechanism in an HHWA should be simple and reliable, without incurring
too much power, area and latency overhead. We divide routing mechanism into local routing and global routing by whether
using wireless/RF-I. Local routing depends on the topology of the subnets. For example, if the PEs within a subnet are
connected in a mesh, then data routing within the subnet follows dimension order routing. Global routing relates to whether
and how to use the RF/wireless interconnects. Flow control, deadlock avoidance and RF-I/wireless resource management
strategy are key problems in the global routing design. Kim et al. [23] and Deb et al. [24] analyzed the different strategies
adopted by existing HWWAs, and provide very good references and guidance for future HHWA designs. A comprehensive
study quantifying merits and limitations for different strategies and their implementation challenges needs to be carried out,
with an informative comparative analysis [24].

Wireless/RF-I resource allocation


According to the ITRS [36], unity current gain frequency fT and maximum available power gain fmax will be 600 GHz and
1 THz, respectively, in 16 nm CMOS technology. With the advances in CMOS circuits, tens to hundreds of gigahertz of
bandwidth will be available in the near future [26] [12][15][24]. How to efficiently utilize the available bandwidth is one of
the important problems in HHWA design. The arbitration mechanisms for wireless/RF-I resource contention were
discussed in Section 3, which showed that bandwidth sharing between all the wireless/RF-I access points (referred to as a
bandwidth sharing scheme) with stream arbitration performs better in non-uniform traffic compared with token arbitration
with a specific exclusive occupancy for every wireless/RF-I access point (referred to as a bandwidth distributed scheme). If
we partition the aggregate bandwidth into a set of communication channels (aggregate bandwidth is calculated as the
number of channels multiplied by the bandwidth of each channel), each wireless/RF-I access point can only obtain a small
proportion of the total bandwidth in the distributed allocation strategy. Because every access point occupies a specific
channel, this mechanism is very efficient for uniform traffic patterns with high access contention. For a sharing mechanism,
all the available bandwidth is a public resource, and only the winners occupy the channels in a fixed period, so as to
dynamically allocate the resource as demanded in real time with better bandwidth utilization.
To further explore the influence of bandwidth allocation, Xiao et al. [31] did an experiment with fixed aggregate
bandwidth with stream arbitration and a bandwidth sharing scheme. This work adjusted the number of channels and the
channel bandwidth to achieve that aggregate bandwidth. The simulation results showed that a compromise needs to be
found between high bandwidth channels and additional channels. There is potential optimization for bandwidth allocation
with a dynamic bandwidth partition [31].

Transmission reliability
Although wireless/RF-I performs well for long distance transmission with high bandwidth, low latency and low energy
consumption, the bit-error problem is a challenge to ensuring reliable message transmission. Within the maximum
communication distance of future CMPs, 1.5 cm, the bit-error rate (BER) of the on-chip wireless channel is less than 109,
which is far higher than that of RC wires. (Current RC wires have an extremely low BER of approximately 1014 [12].)
Error control coding (ECC) is explored by Ganguly et al. [37], who showed that by implementing joint crosstalk avoidance
triple error correction and simultaneous quadruple error detection codes [38] in the wire line links and Hamming code
based product codes (H-PCs) in the wireless links of a hierarchical wireless NoC with CNT antennas [37], it is possible to
improve overall reliability of the wireless NoC manifold. However, application of ECC introduces timing and area
overhead and also incurs fixed overhead over every packet [12][15]. Research into WCube devised a novel and simple loss
management solution that uses a zero-signalingoverhead scheme, overhearing-and-retransmission (OAR), based on
overhearing on intermediate hops, and uses an on-demand, checksum-based error-detection and retransmission scheme at
the last hop [12]. OAR detects and recovers packet losses without extra signaling overhead with a buffer-based mechanism.
The packet is verified by the checksum at the destination, and retransmits if the checksum does not match. This solution is
simple, and induced less extra cost compared with ECC, but the forwarding sequence of packets should be kept to ensure
the correct transmission.

Scalability
To target future large-scale CMPs, scalability is one of the most important problems for the design of an on-chip hybrid
hierarchical architecture. Lee et al. [12] proposed the WCube recursive wireless interconnect structure, which offers
connectivity to thousands of cores in CMPs. A case study with a network consisting of 1024 PEs proved efficient with
WCube and demonstrated a reduced observed latency of 20% to 45% compared to current 2-D wired mesh designs. Since
future communication patterns tend towards the non-uniform and heterogeneous, Xiao et al. [31] proposed a cluster-based

434

Xiao et al.: A Tutorial for Key Problems in the Design of Hybrid Hierarchical NoC Architectures with Wireless/RF

hierarchical architecture that uses a local transmission line for each core cluster, and a global TL to connect the local TLs.
A network with 16x16 RF nodes for a 32x32 router NoC (each 2x2 router shares one RF node) proved efficient in average
network latency and energy consumption with a hierarchical TL architecture and hierarchical stream arbitration, compared
to architecture with a single TL spanning the whole trip [31]. The three-level architecture with traditional RC connects, RFI and wireless links is also one of the potential solutions for scalability in architecture, and detailed implementation needs to
be proposed in future designs.

Conclusion
As a new architecture composite with emerging interconnects, new design challenges need to be targeted for hybrid
hierarchical wireless/RF-I architectures. Based on analysis of the existing typical HHWAs, we explored strategies for
wireless/RF-I resource management for the first time and discussed the strengths and disadvantages of different solutions.
The key problems in hybrid hierarchical wireless/RF-I architecture design are explored, and related potential solutions are
provided, which we expect to serve as a basis to help with future HHWA designs. Quantitative analysis for the performance
benefits of different HHWAs need to be benchmarked in future work, and detailed investigations for physical
implementations need to be explored in the future.

References
[1] International Technology Roadmap for Semiconductors (ITRS), 2012.
[2] A. Shacham, K. Bergman, L. P. Carloni, Photonic networks-on-chip for future generations of chip multiprocessors,
IEEE Transactions on Computers, vol. 57, no. 9, pp. 1246-1260, 2008. Article (CrossRef Link)
[3] D. Vantrease, R. Schreiber, M. Monchiero, M. McLaren, N. P. Jouppi, M. Fiorentino, A. Davis, N. Binkert, R. G.
Beausoleil, J. H. Ahn, Corona: System Implications of Emerging Nanophotonic Technology, in Proc. of the 35th
Annual International Symposium on Computer Architecture (ISCA08), Washington, DC, USA, pp. 153-164, 2008.
Article (CrossRef Link)
[4] M. F. Chang, I. Verbauwhede, C. Chien, Z. Xu, J. Kim, J. Ko, Q. Gu, B. Lai, Advanced RF/baseband interconnect
schemes for inter- and intra-ulsi communications, IEEE Transactions on Electron Devices, vol. 52, no. 7, pp. 12711285, 2005. Article (CrossRef Link)
[5] M. F. Chang, E. Socher, R. Tam, J. Cong, G. Reinman, RF interconnects for communications on-chip, in Proc. of
the 2008 international symposium on Physical design (ISPD08), ACM New York, NY, pp. 78-83, 2008. Article
(CrossRef Link)
[6] M. F. Chang, J. Cong, A. Kaplan, M. Naik, G. Reinman, E. Socher, S.-W. Tam, CMP Network-on-Chip Overlaid
with Multi-Band RF-Interconnect, in Proc. of the IEEE Int'l Symposium on High-Performance Computer
Architecture (HPCA), Salt Lake City, UT, February, pp. 191-202, 2008. Article (CrossRef Link)
[7] D. Zhao, Y. Wang, SD-MAC: Design and Synthesis of A Hardware-Efficient Collision-Free QoS-Aware MAC
Protocol for Wireless Network-on-Chip, IEEE Transactions on Computers, vol. 57, no, 9, pp. 1230-1245Sep, 2008.
Article (CrossRef Link)
[8] Y. Wang, D. Zhao, The Design and Synthesis of a Synchronous and Distributed MAC Protocol for Wireless
Network-on-Chip, in Proc. IEEE Intl Conf. Computer-Aided Design, Nov. 2007. Article (CrossRef Link)
[9] S. Deb, K. Chang, et al., Design of an Efficient NoC Architecture using Millimeter-Wave Wireless Links, in Proc.
of 13th Intl Symposium on Quality Electronic Design, pp. 165-172, Mar. 2012. Article (CrossRef Link)
[10] L. P. Carloni, P. Pande, Y. Xie, Networks-on-chip in emerging interconnect paradigms: Advantages and challenges,
in Proc. of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip, pp. 93-102, 2009. Article
(CrossRef Link)
[11] H. S. Wang, X. Zhu, L. S. Peh, S. Malik, Orion: A power-performance simulator for interconnection networks, in
Proc. of the 35th Annual ACM/IEEE International Symposium on Microarchitecture, pp. 294305, Nov. 2002. Article
(CrossRef Link)
[12] S. B. Lee et al., A scalable micro wireless interconnect structure for CMPs, in Proc. ACM Annu. Int. Con. Mobile
Comput. Network. (MobiCom), pp. 20-25, 2009. Article (CrossRef Link)
[13] D. D. Tomaso et al., iWise: Inter-router wireless scalable express channels for Network-on-Chips (NoCs)
architecture, in Proc. Annu. Symp. High Performance Interconnects, pp. 11-18, 2011. Article (CrossRef Link)
[14] W. J. Dally, Express cubes: Improving the performance of k-ary n-cube interconnection networks, IEEE Trans.
Computers, vol. 40, no. 9, pp. 1016-1023, Sep. 1991. Article (CrossRef Link)
[15] A. Ganguly, K. Chang, S. Deb, P. Pande, B. Belzer, C. Teuscher, Scalable hybrid wireless network-on-chip
architectures for multicore systems, IEEE Trans. Computers, vol. 60, no. 10, pp. 1485-1502, Oct. 2011. Article

Smart Computing Review, vol. 3, no. 6, December 2013

435

(CrossRef Link)
[16] M. F. Chang, J. Cong, A. Kaplan, A. Kaplan, C. Liu, M. Naik, J. Premkumar, G. Reinman, E. Socher, S.-W. Tam,
Power reduction of CMP communication networks via RF-interconnects, in Proc. of the 41st annual IEEE/ACM
International Symposium on Microarchitecture (MICRO 41), Washington, DC, USA, pp. 376-387, 2008. Article
(CrossRef Link)
[17] W. J. Dally, T. B, Principles and Practices of Interconnection Networks. Waltham, MA: Morgan Kaufmann, 2004.
[18] K. Chang, S. Deb, et al., Performance Evaluation and Design Trade-offs for Wireless Network-on-Chip Architecture,
ACM Journal on Emerging Technologies in Computing Systems, vol. 8, no. 8, 2012. Article (CrossRef Link)
[19] M. F. Chang, V. P. Roychowdhury, L. Zhang, H. Shin, Y. Qian, RF/wireless interconnect for inter- and intra-chip
communications, Proceedings of the IEEE, vol. 89, no. 4, Apr. 2001. Article (CrossRef Link)
[20] J. Ko, J. Kim, Z. Xu, Q. Gu, C. Chien, M. Chang, An RF/baseband FDMA-interconnect transceiver for
reconfigurable multiple access chip-to-chip communication, in Proc. of Dig. Tech. Papers Int. Solid-State Circuits
Conf., vol. 1, pp. 338-602, Feb. 2005. Article (CrossRef Link)
[21] H. Wu, L. Nan, S.-W. Tam, et al., A 60GHz on-chip RF-Interconnect with /4 coupler for 5Gbps bi-directional
communication and multi-drop arbitration, in Proc. of Custom Integrated Circuits Conference (CICC), pp. 1-4, 2012.
Article (CrossRef Link)
[22] Y. Kim, G.-S. Byun, A. Tang, C.-P. Jou, H.-H. Hsien, G. Reinman, J. Cong, M. F. Chang, An 8Gb/s/pin 4pJ/b/pin
single-t-line dual (Base+RF) band simultaneous bidirectional mobile memory I/O interface, in Proc. of the IEEE
International Solid-State Circuits Conference (ISSCC), pp. 50-51, 2012. Article (CrossRef Link)
[23] J. Kim, K. Choi, et al., Exploiting New Interconnect Technologies in On-Chip Communication, IEEE Journal on
emerging and selected topics in circuits and systems, vol. 2, no. 2, pp124-136, June 2012. Article (CrossRef Link)
[24] S. Deb, A. Ganguly, P. Pande, D. Heo, B. Belzer, Wireless NOC as interconnection backbone for multicore chips:
Promises and challenges, IEEE Journal on emerging and selected topics in circuits and systems, vol. 2, no. 2, pp228239, June 2012. Article (CrossRef Link)
[25] J. Lin et al., Communication using antennas fabricated in silicon integrated circuits, IEEE J. Solid-State Circuits,
vol. 42, no. 8, pp.1678-1687, Aug. 2007. Article (CrossRef Link)
[26] S. Deb et al., Enhancing performance of Network-on-Chip architectures with millimeter-wave wireless interconnects,
in Proc. IEEE Int. Conf. ASAP, pp. 73-80, 2010. Article (CrossRef Link)
[27] K. Kempa et al., Carbon nanotubes as optical antennae, Adv. Mater., vol. 19, pp. 421-426, 2007. Article (CrossRef
Link)
[28] D. J. Watts, S. H. Strogatz, Collective dynamics of small-world networks, Nature, vol. 393, pp. 440442, 1998.
Article (CrossRef Link)
[29] M. F. Chang, J. Cong, A. Kaplan, M. Naik, G. Reinman, E. Socher, S.-W. Tam, CMP Network-on-Chip Overlaid
with Multi-Band RF-Interconnect, UCLA Computer Science Department Technical Report UCLA/CSD-TR-07-0032,
Dec. 2007.
[30] A. Kumar, L.-S. Peh, N. K. Jha, Token flow control, in Proc. of the 41st IEEE/ACM International Symposium on
Microarchitecture (MICRO 08), pp. 342-353, 2008. Article (CrossRef Link)
[31] C. Xiao, M.-C. Frank Chang, J. Cong, M. Gill, Z. Huang, C. Liu, G. Reinman, H. Wu, Stream Arbitration: Towards
Efficient Bandwidth Utilization for Emerging On-Chip Interconnects, ACM Transactions on Architecture and Code
Optimization, vol. 9, no. 4, Jan. 2013. Article (CrossRef Link)
[32] A. E. Eiben, J. E. Smith, Introduction to Evolutionary Computing, Springer Berlin, 2003. Article (CrossRef Link)
[33] M. Sipper, Evolution of Parallel Cellular Machines: The Cellular Programming Approach, Springer Berlin, 1997.
Article (CrossRef Link)
[34] S. Kirkpatrick, Jr C. D. Gelatt M. P. Vecchi, Optimization by simulated annealing, Science, vol. 220, pp. 671-680,
1983. Article (CrossRef Link)
[35] T. Jansen, I. Wegener, A comparison of simulated annealing with a simple evolutionary algorithm on pseudoboolean functions of unitation, Theor. Comput. Sci, vol. 386, pp. 73-93, 2007. Article (CrossRef Link)
[36] International technology roadmap for semiconductors, 2007 edition.
[37] A. Ganguly et al., A unified error control coding scheme to enhance the reliability of a hybrid wireless Network-onChip, in Proc. IEEE Int. Symp. Defect Fault Tolerance VLSI Nanotechnol. Syst, pp.277285, 2011. Article (CrossRef
Link)
[38] A. Ganguly et al., Crosstalk-aware channel coding schemes for energy efficient and reliable NoC interconnects,
IEEE Trans. Very Large Scale (VLSI) Syst., vol. 17, no. 11, pp. 16261639, Nov. 2009. Article (CrossRef Link)
[39] N. Hardavellas, M. Ferdman, B. Falsafi, A. Ailamaki, Reactive NUCA: near-optimal block placement and replication
in distributed caches, in Proc. of the 36th annual international symposium on Computer architecture (ISCA '09).
ACM, New York, NY, USA, 184-195, 2009. Article (CrossRef Link)
[40] H. Lee, S. Cho, R. C. Bruce, StimulusCache: Boosting Performance of Chip Multiprocessors with Excess Cache,
Proc. of the IEEE Int'l Symposium on High-Performance Computer Architecture (HPCA), Bangalore, India, Jan. 2010.

436

Xiao et al.: A Tutorial for Key Problems in the Design of Hybrid Hierarchical NoC Architectures with Wireless/RF

Chunhua Xiao received her B.S. in Electronic Information Engineering from Shijiazhuang
Tiedao University, Hebei Province, China, in 2007, and her M.S. in Computer Science from
Beijing University of Technology, Beijing, China, in 2010. She is currently a PhD student in
Department of Computer Science and Technology, Beijing University of Technology. Her
research interests include embedded system co-design, Multi-processor system-on-chip, and
Network-on-Chip.

Zhangqin Huang received his B.S., M.S., and PhD in Computer Science from Xian Jiaotong
University, China, in 1986, 1989 and 2000, respectively. He is currently the Deputy Director of
the Embedded Software and Systems Institute (ESSI), Beijing University of Technology (BJUT),
China. His current research interests include co-design for embedded software and hardware,
humancomputer interaction based on internet, Multi-processor system-on-chip, mass data
storage, and network information security.

Da Li received his B.S., M.S., and PhD in Computer Science from Xian Jiaotong University,
China, in 2002, 2006 and 2012, respectively. He is currently a instructor of Embedded Software
and Systems Institute (ESSI), Beijing University of Technology (BJUT). His research interests
include embedded FPGA system design and multi-core processors.

Copyright 2013 KAIS

You might also like