You are on page 1of 5

1607 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO.

4, APRIL 2016

A New CDMA Encoding/Decoding Method for


on-Chip Communication Network
Jian Wang, Zhonghai Lu, and Yubai Li

Abstract As a high performance on-chip communication method, decoding method, a key demux-accumulation-compare unit is
the code division multiple access (CDMA) technique has recently been used to retrieve the source data from mixed data chips (in this
applied to networks on chip (NoCs). We propose a new standard-basis-
brief, each bit of a spreading code is called a chip, and thus
based encoding/decoding method to leverage the performance and cost
of CDMA NoCs in area, power assumption, and network throughput. the encoded data is called data chips). This unit is, however,
In the transmitter module, source data from different senders are area-consuming.
separately encoded with an orthogonal code of a standard basis and 2) Low Code Utilization: In an S-chip Walsh code set [6], S must
these coded data are mixed together by an XOR operation. Then, the be equal to 2N , where N is a natural number, and at most
sums of data can be transmitted to their destinations through the on-
chip communication infrastructure. In the receiver module, a sequence of S 1 sequences can be used to encode the original data. This
chips is retrieved by taking an AND operation between the sums of data results in a waste of sequences in the code set. For example,
and the corresponding orthogonal code. After a simple accumulation a 16-node network needs a 32-chip Walsh code set, because
of these chips, original data can be reconstructed. We implement our a 16-chip Walsh code set can only provide 15 sequences for
encoding/decoding method and apply it to a CDMA NoC with a star
topology. Compared with the state-of-the-art Walsh-code-based (WB)
data encoding and it thus cannot satisfy the requirement of
encoding/decoding technique, our method achieves up to 67.46% power 16 sequences, one for each node.
saving and 81.24% area saving together with decrease of 30%50% To address the aforementioned weaknesses, we propose a
encoding/decoding latency. Moreover, the CDMA NoC with different new standard-basis-Based (SB) encoding/decoding method, which
sizes applying our encoding/decoding method gains power saving, area
saving, and maximal throughput improvement up to 20.25%, 22.91%, outperforms the WB encoding/decoding method. The SB encoding/
and 103.26%, respectively, than the WB CDMA NoC. decoding method can be applied to any CDMA NoCs to improve
their performance.
Index Terms Code division multiple access (CDMA), The rest of this brief is organized as follows. In Section II, we dis-
integrated circuit (IC), network on chip (NoC). cuss related work. In Section III, we detail the SB encoding/decoding
method and formally prove its correctness. The simulation results and
I. I NTRODUCTION comparisons between the SB method and the WB method are pre-
sented in Section IV. Finally, we draw the conclusions in Section V.
With the rapid growth of the computational complexity, more and
more processing elements (PEs) are integrated onto a single chip, II. R ELATED W ORK
and network on chip (NoC) has been proposed to address the scal- The CDMA technique has recently attracted research attentions
ability, throughput, and reliability issues of on-chip communication.
in the NoC community. To show the advantages of CDMA NoC,
However, conventional packet-switched NoCs suffer from nondeter-
Kim et al. [4] used Walsh codes to distinguish different senders and
ministic transmission latency and limited opportunities for parallel
develop a hierarchical star-mesh topology to handle a large number
data transfer, since multiple flows cannot get through a link at the of communication processors. The simulation results show that the
same time [1]. To resolve these problems, the CDMA technique [2]
CDMA NoC has good performance in latency and throughput. In [7],
as an effective method for implementing high performance on-chip
a specific application is scheduled onto a CDMA-based NoC and a
communication [3] was applied to NoCs [4], [5].
conventional crossbar-based packet-switched NoC. The experimental
The previously proposed CDMA NoCs are based on a digital
results show that the CDMA NoC achieves lower packet transfer
encoding and decoding method requiring that the spreading codes
latency and less area overhead.
have both the orthogonal and balance properties. To this end, the
To further improve the CDMA NoC performance, a globally
Walsh code is typically used. However, the Walsh-code-based (WB) asynchronous locally synchronous (GALS) CDMA NoC is modeled
encoding and decoding method has inherent shortcomings, which are
and simulated in [8]. By applying the GALS strategy to CDMA
given as follows.
NoC, the CDMA NoC can be used for asynchronous chips. In [9],
1) Design Complexity: In the encoding method, an arithmetic the multicasting function is realized in CDMA NoC to address the
addition logic unit, whose logic overhead increases with the hotspot problem at the center of NoC with mesh topology. The
number of senders, is used to mix coded data together. In the results exhibit that traffic congestion at the center of NoC is reduced.
Manuscript received March 31, 2015; revised July 1, 2015; accepted
Since code utilization rate affects the CDMA NoC performance,
August 6, 2015. This work was supported in part by the Sichuan Province two methods on code word assignment are separately proposed
Science and Technology Support Program under Grant 2014SZ0093, in part in [10] and [11]. In [10], the length of code word is adjusted
by the China Post-Doctoral Science Foundation under Grant 2014M562303, depending on the number of nodes that have packets to send at
and in part by the National Natural Science Funds under Grant 61201005. the same time. In [11], a scheduling method is proposed to make a
J. Wang is with the University of Electronic Science and Technology of
China, Chengdu 610051, China, and also with the KTH Royal compromise on the utilization rate of code words. Simulation results
Institute of Technology, Stockholm 114 28, Sweden (e-mail: show that these methods improve the utilization efficiency of the
wangjian3630@uestc.edu.cn). orthogonal codes.
Z. Lu is with the KTH Royal Institute of Technology, Stockholm 114 28, Besides the traditional wired NoC, the CDMA technique has also
Sweden (e-mail: zhonghai@kth.se).
Y. Li is with the University of Electronic Science and Technology of China,
been applied to photonic NoC [12] and wireless NoC [13]. The results
Chengdu 610051, China (e-mail: ybli@uestc.edu.cn). show that their performance can be significantly improved with lower
Digital Object Identifier 10.1109/TVLSI.2015.2471077 energy and area compared with the crossbar-based NoC.
1063-8210 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
WANG et al.: CDMA ENCODING/DECODING FOR ON-CHIP COMMUNICATION 1608

Fig. 1. Structure of CDMA NoC.

Fig. 3. Data encoding example. (a) WB encoding. (b) SB encoding.

Fig. 2. Block diagram of encoding scheme. (a) WB encoder. (b) SB encoder.

All the above-mentioned CDMA NoCs use the WB encoding/


decoding method, and thus they have the drawbacks mentioned
in Section I. Our proposed SB encoding/decoding method shall Fig. 4. Block diagram of decoding scheme. (a) WB encoder. (b) SB encoder.
overcome the common weaknesses.
Fig. 2(a) shows the WB encoder architecture. An original data bit
III. N EW CDMA E NCODING AND D ECODING M ETHOD is first encoded with a Walsh code by taking an XOR operation. Then,
A. Overall Structure of CDMA NoC these encoded data are added up to a multibit sum signal by taking
arithmetical additions. Each sender needs an XOR gate, and multiple
The basic structure of applying CDMA technique to NoC with a
wires are used to express the sum signal if we have two or more
star topology is shown in Fig. 1. In this figure, a PE executes tasks of
senders. Moreover, the number of wires increases as the number of
the application and network interface (NI) divides data flows from PE
senders increases.
into packets and reconstruct data flows by using packets from NoC.
Fig. 2(b) shows our SB encoding scheme. An original data bit
In the sender, packet flits from NI are transformed to a sequential
from a sender is fed into an AND gate in a chip-by-chip manner,
bit stream via a parallel-to-serial (P2S) module. This bit stream
and it will be spread to n-chip encoded data with an orthogonal code
is encoded with an orthogonal code in the Encoding module
of a standard basis. The relationship between a bit and a chip is
(E in Fig. 1). The coded data from different encoding modules are
shown in Fig. 3. Then, the encoded data from different senders are
added together in the Addition module (A in Fig. 1). Then, the sums
mixed together through an XOR operation, and a binary sum signal
of data chips are transmitted to receivers. In the receiver, Decoding
is generated. Therefore, the output signal is always a sequence of
modules (D in Fig. 1) reconstruct original data bits from the sums
binary signal transferred to destination using one single wire. The
of data chips. Then these sequential bit streams are transformed to
progressions of both the encoding schemes are depicted in Fig. 3.
packet flits by serial-to-parallel (S2P) modules. Finally, these packet
Fig. 3(a) and (b) illustrates the WB encoding process with four-chip
flits are transferred to NI. In the CDMA NoC, network scheduler
Walsh codes and the SB encoding process with four-chip standard
receives the transmitting requests from senders and assigns proper
orthogonal codes, respectively.
spreading codes to the senders and requested receivers. Note that
all-zero codeword is assigned to nodes having no data to transmit/
receive. Moreover, when there are multiple senders requesting the C. CDMA Decoder
same receiver, the scheduler will apply an arbitration scheme, The WB decoding scheme is presented in Fig. 4(a). According
for example, round-robin. The chip counters calculate how many to the chip value of Walsh code, the received multibit sums are
orthogonal chips are used in one encoding/decoding operation. Each accumulated into positive part (if the chip value is 0) or negative
node needs two chip counters, one for the sender and the other for part (if the chip value is 1). Therefore, the two accumulators in
the receiver. Note that packet flits from NI can also be transformed the WB decoder separately contain a multibit adder to accumulate
to multiple bit streams in the P2S module to make tradeoffs between the coming chips and a group of registers to hold the accumulated
power/area cost and packet transfer latency, and the scheduler should value. Through the comparison module after the two accumulators,
provide a bit-synchronous scheme to maintain the orthogonality of the original data is reconstructed. If the value of positive part is large,
the transmitted channels, as discussed in [8]. the original data is 1. Otherwise, the original data is 0.
In this brief, we focus on the design and comparison of The SB decoding scheme is shown in Fig. 4(b). When the binary
WB- and SB-based CDMA encoding/decoding method, which cor- sum signal arrives at receivers, an AND operation is taken between
responds to E, A, and D modules in Fig. 1. the binary sum and the corresponding orthogonal code in chip-by-
chip manner. Then, the result chips are sent to an accumulator. After
B. CDMA Encoder m-chips are accumulated (m is the length of the orthogonal code), the
Two different encoding methods, WB encoder and SB encoder, are output value of the accumulator will be the corresponding original
compared in Fig. 2. data. Note that there is always only one chip equal to 1 and all other
1609 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 4, APRIL 2016

TABLE II
D IFFERENCES B ETWEEN WB AND SB M ETHOD

Fig. 5. Data decoding example. (a) WB decoding at receiver 1.


(b) SB decoding at receiver 1.
the sequence of chips. The process can be expressed by
TABLE I
M ij = S i C ij (3)
D EFINITION OF N OTATIONS
m
Oj = M ij . (4)
i=1
Now, we prove that the original input bit I j is equal to the output
bit Oj (without loss of generality, we assume that the packets from
sender j are transmitted to receiver j, and thus the encoder j and the
decoder j have the same orthogonal code C j ). From formulas (1)(4),
we get

m
 i      
Oj = I 1 C i1 I i2 C i2 I in C in Cij
i=1
1   i
m
   
= I 1 C i1 Cij I in C in Cij
i=1
2  i 3  i
m m
  
= I j C ij Cij = I j C ij
i=1 i=1
chips are equal to 0 for an orthogonal code in standard basis. Hence, 
4   
m
5
= Ij Cij = I j . (5)
the maximal accumulated value in the SB accumulator is 1 and it can
i=1
be stored in a 1-bit register. Therefore, in the SB decoding module,
only one AND gate and an accumulator with one 1-bit register are We can simplify formula (5) by the following steps.
used, resulting in less logical resources. 1) According to the logical computation properties, logical
An example of the decoding process is illustrated in Fig. 5. conjunction can be distributed over XOR.
In Fig. 5(a), at the WB decoder of receiver 1, the accumulated value 3 2) In the standard basis, codes are orthogonal to each other and
in the positive part is larger than the accumulated value 1 in the each code has one and only one chip that is equal to 1.
negative part. By the WB decoding scheme, the decoded data is 1, Therefore, if p =  q, C ip Cqi = 0.
which is equal to the source data bit from sender 1. In Fig. 5(b), at the 3) For an orthogonal code in standard basis, each chip is
SB decoder of receiver 1, the output value of the accumulator is 1, a Boolean variable, and thus C ij C ij is equal to C ij according
which is also equal to the source data bit from sender 1. Note that to the idempotency propriety for the logical AND operation.
the decoding results in receiver 2 are also correct, but are not shown 4) When a bit is divided into sequential chips in the time domain,
in the figure. Hence, both methods can reconstruct the original data the logical value of that bit does not changed. Therefore, the
bit from the sum signal by using their respective spreading codes. logical value of chips is equal to each other, and it is also
equal to that of the original bit. In other words, we have
I j = I 1j = I 2j = = I m
j .
D. Correctness of Proposed Method 5) An orthogonal code in standard basis has one and only one
We formally prove the correctness of our proposed SB scheme. chip to be 1, and the other chips are equal to 0. Therefore, the
Notations are defined in Table I. sum of all the chips is equal to 1.
According to the SB scheme, the encoded data Y j is generated by From formula (5), we get that the logical value of the decoder
I j and C j with an AND operation in a chip-by-chip manner. Therefore, output bit is always equal to the logical value of original bit injected
the relationship of Y ij , I ij , and C ij can be expressed by formula (1) into the encoder. This proves that our proposed SB encoding/decoding
and the value of the ith chip in the sum data can be expressed by scheme is correct.
formula (2)
E. Summary of the Two Methods
Y ij = I ij C ij (1) Table II summarizes the differences between the WB method and
S i = Y i1 Y i2 Y in (2) the SB method. It shows the details of logic modules (the third and
the fourth column) and latency (the fifth and the sixth column) in
where () means AND (XOR) operation. each step of the encoding/decoding operation.
In the decoder module, it first extracts a sequence of chips by AND The first step is to encode original bits with a spreading code
operation and then reconstructs the source data bit by accumulating (an XOR operation in the WB encoder and an AND operation in
WANG et al.: CDMA ENCODING/DECODING FOR ON-CHIP COMMUNICATION 1610

TABLE III
C OMPARISON OF E NCODER /D ECODER A REA AND P OWER

Fig. 6. CDMA NoC scaling. (a) Directly scaling. (b) Cluster-based scaling.

the SB encoder). Since the chip frequency is equal to the clock


frequency, the number of clock cycles spent on spreading is related to
the length of spreading codes. Because we use standard basis in the
SB scheme, only p clock cycles are required to finish the spreading
operation for p senders. However, for the WB scheme, a q chip
Walsh code is required to spread the original data bits for p senders.
p
As mentioned in Section I, q = 2(log2 +1) ( p + 1).
In the following steps, both methods need sum, extract, and
accumulate operations, and these operations are realized using a
multibit adder (an XOR gate), a demux logic module (an AND gate),
and two multibit accumulators (one single accumulator) in the
WB (SB) method. Since these operations work as a pipeline,
Fig. 7. Comparison of encoding and decoding latency.
only one clock cycle is required in each operation. Moreover, the
WB scheme needs additional comparator that spends one more
clock cycle (corresponding to the last line of Table II).
IV. E XPERIMENTAL R ESULTS
Therefore, the total logic cost of the SB method is lesser than that
of the WB method, since each operation needs less logical resources. We implemented the SB and WB CDMA NoCs at RTL in Verilog.
Besides, the total latency of the SB method is ( p + 3) clock cycles. Synopsys DC with a 40-nm standard cell library is used to synthesize
It is always lower than (q + 4) clock cycles, which is the latency of the power- and area-consumption results. The clock frequency, which
the WB method. is equal to the chip frequency, is set to 2.5 GHz, as advised in [13].

F. Discussion A. Encoder/Decoder Area and Power cost


Although this brief focuses on a new encoding/decoding method We compare area and power cost of the two encoding/decoding
for CDMA NoCs, we discuss some general concerns in scaling, schemes. To be more precise, only the encoder and decoder, besides
topology, routing, and traffic pattern. their external chip counters, are taken into consideration. The chip
1) Scaling: The CN (CDMA NoC) can be scaled to different counters are implemented using multibit registers. Suppose that there
network sizes using two basic methods, as shown in Fig. 6. In the are 6, 8, and 16 nodes, the Walsh code length q is 8, 16, and 32 (with
direct scaling method, the length of orthogonal code will increase 3-, 4-, and 5-bit chip counters), and the standard basis code length p
with the number of PEs and thus is more suitable for small-size is 6, 8, and 16 (with 3-, 3- and 4-bit chip counters), respectively.
NoCs (e.g., CDMA NoC with several PEs [5], [7], [8]). In contrast, Table III shows the encoder/decoder area cost (third, fourth,
the cluster-based scaling method, by which each cluster has several and fifth columns) and power consumption (sixth, seventh, and
PEs and clusters are connected with each other, can be used to scale eighth columns) of the two schemes. IMP represents the percentage
the network hierarchically to any required size [4], [9]. of performance improvement calculated by (|WB-SB|/WB). From
2) Topology: Although a CDMA node cluster may be limited in the the table, we can find that the SB scheme always outperforms
star topology, any other topologies can also be obtained by using the the WB scheme, and total encoding/decoding modules achieve up
cluster-based scaling method. For example, Kim et al. [7] developed to 81.24% area saving and 67.46% power saving. More precisely, for
a hierarchical star topology, Lee and Sobelman [9] developed a mesh the SB encoder (decoder), the area saving is up to 22.15% (86.88%)
topology, and so on. and the power saving is up to 12.9% (75.87%). Moreover, since
3) Routing: There exist various incremental and global routing the counters take a considerable proportion in the encoder/decoder
schemes for CDMA NoCs. Consider an incremental routing, where area (e.g., in the six-node SB encoder, the 3-bit chip counters
the routing schemes are related to the packet formats. In general, consume about 90 m 2 , larger than 50% of the SB encoder area),
the packet header contains the destination PE address. The the improvement is nonlinear as the number of nodes grows.
source CN checks the destination address, determines the next-hop
CN or PE, and allocates a corresponding spread code for the packet
encoding and decoding to reach the right output port. The next-hop B. Comparison of Encoding/Decoding Latency
CN continues the process until the destination PE is reached. More Fig. 7 compares the encoding/decoding latency of the two schemes
details and other routing schemes can be found in [4] and [9]. under the same configurations as the previous subsection. The sums
4) Traffic Patterns: For the CDMA NoC, the influence of traffic of data chips generated by the encoders are directly injected into the
patterns has been discussed before (see [14]), and some real appli- decoders. Sender i transfers its data to receiver i. We can find that the
cations have also been mapped onto the CDMA NoCs to show their SB encoding/decoding latency is always lower than the WB scheme,
advantages over the packet-switched NoCs (see [7]). and the latency saving percentage is about 30%50%.
1611 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 4, APRIL 2016

TABLE IV P ERFORMANCE C OMPARISON OF V. C ONCLUSION


CDMA N O Cs
We propose a new CDMA encoding/decoding method for on-chip
communication. It can be realized by using simple logic and costs
less power and area. The standard basis other than the Walsh code
is used as the spreading code in our method. It thus decreases the
encoding/decoding latency and increases the maximum throughput of
NoCs. Mathematical proof is conducted to prove the correctness of
our method. From the experimental results, we find that our method
outperforms the WB encoding/decoding scheme, and the CDMA NoC
performance is also improved when our method is applied.

R EFERENCES
[1] D. Sigenza-Tortosa, T. Ahonen, and J. Nurmi, Issues in the develop-
ment of a practical NoC: The Proteo concept, Integr., VLSI J., vol. 38,
no. 1, pp. 95105, 2004.
[2] A. J. Viterbi, CDMA: Principles of Spread Spectrum Communication.
Reading, MA, USA: Addison-Wesley, 1995.
[3] S. Shimizu, T. Matsuoka, and K. Taniguchi, Parallel bus systems using
code-division multiple access technique, in Proc. Int. Symp. Circuits
Syst., May 2003, pp. II-240II-243.
Fig. 8. Maximal throughput of CDMA NoC.
[4] D. Kim, M. Kim, and G. E. Sobelman, CDMA-based network-on-chip
architecture, in Proc. IEEE Asia-Pacific Conf. Circuits Syst., Dec. 2004,
pp. 137140.
C. Comparison of CDMA NoCs
[5] X. Wang and J. Nurmi, An on-chip CDMA communication network,
We compare the performance of CDMA NoCs using the two in Proc. Int. Symp. Syst.-Chip, Nov. 2005, pp. 155160.
encoding/decoding schemes. Besides the encoder module and decoder [6] E. H. Dinan and B. Jabbari, Spreading codes for direct sequence CDMA
module, other on-chip modules, such as network scheduler, parallel- and wideband CDMA cellular networks, IEEE Commun. Mag., vol. 36,
no. 9, pp. 4854, Sep. 1998.
to-serial modules, and serial-to-parallel modules, are all included in [7] M. Kim, D. Kim, and G. E. Sobelman, MPEG-4 performance analysis
the NoCs. The results of CDMA NoC performance in area cost for a CDMA network-on-chip, in Proc. Int. Conf. Commun., Circuits,
and power consumption (measured by using DC tool) are given Syst., May 2005, pp. 493496.
in Table IV. [8] X. Wang, T. Ahonen, and J. Nurmi, Applying CDMA technique to
network-on-chip, IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,
From Table IV, we can find that the SB CDMA NoC has lower vol. 15, no. 10, pp. 10911100, Oct. 2007.
area and power cost than the WB CDMA NoC. Since both NoCs [9] W. Lee and G. E. Sobelman, Mesh-star hybrid NoC architecture with
contain the parallel-to-serial module, scheduler module, and serial- CDMA switch, in Proc. IEEE Int. Symp. Circuits Syst., May 2009,
to-parallel module, the percentage of the NoC power and area saving pp. 13491352.
[10] M. Kim, D. Kim, and G. E. Sobelman, Adaptive scheduling for
is not as much as that of the encoder/decoder power and area saving CDMA-based networks-on-chip, in Proc. 3rd Int. IEEE-NEWCAS
in Table III. However, the SB CDMA NoC still gains 12.11%20.25% Conf., Jun. 2005, pp. 357360.
power saving and 15.20%22.91% area saving. [11] W. Lee and G. E. Sobelman, Semi-distributed scheduling for flexible
Fig. 8 evaluates the maximal throughput of both NoCs using codeword assignment in a CDMA network-on-chip, in Proc. IEEE 8th
one uniform and two hotspot traffic patterns (X%-hotspot means a Int. Conf. ASIC, Oct. 2009, pp. 431434.
[12] S. Poddar, P. Ghosal, P. Mukherjee, S. Samui, and H. Rahaman, Design
single hotspot node receiving X% more packets than other nodes). of an NoC with on-chip photonic interconnects using adaptive CDMA
Each packet has sixteen 32-bit flits. A packet can be injected into links, in Proc. IEEE Int. Conf. SOC, Sep. 2012, pp. 352357.
NoC immediately once the previous packet of that flow reaches its [13] A. Vidapalapati, V. Vijayakumaran, A. Ganguly, and A. Kwasinski,
destination. Compared with the WB CDMA NoC, the SB CDMA NoC architectures with adaptive code division multiple access based
wireless links, in Proc. IEEE Int. Symp. Circuits Syst., May 2012,
NoC improves the maximal throughput from 35.52% to 103.26% due pp. 636639.
to faster processing of packets with shorter orthogonal code lengths. [14] X. Wang and J. Nurmi, Modeling a code-division multiple-access
These results are in accordance with the analysis in Section III-E. network-on-chip using SystemC, in Proc. Norchip, Nov. 2007, pp. 15.

You might also like