You are on page 1of 4

DESIGN OF RESIDUE-TO-BINARY CONVERTER FOR

A NEW 5-MODULI SUPERSET RESIDUE NUMBER SYSTEM


Bin Cao, Thambipillai Srikanthan and Chip-Hong Chang
Centre for High Performance Embedded Systems, Nanyang Technological University,
N4-B3b-06, Nanyang Avenue, Singapore 639798
ABSTRACT
This paper presents an efficient residue-to-binary (R/B)
conversion algorithm for a new 5-moduli superset {2n  1, 2n, 2n
+ 1, 2n1  1, 2n1  1} residue number system (RNS) when n is
even. The new moduli set is provided for larger dynamic range
and higher parallelism. Our R/B conversion algorithm is based
on a 4-moduli set R/B converter and the mixed-radix conversion
(MRC) technique. The proposed architecture is built around full
adders, which can be easily pipelined to achieve high throughput
rate. Our investigations show that the resulting architecture is
notably more efficient than that proposed for an existing 5moduli set RNS in terms of area, delay and power consumption.

1. INTRODUCTION
The intense drive for low power-delay product has been at the
forefront of the escalated development for the leading-edge VLSI
and Giga-scale-integration (GSI) circuits. Residue Number
System (RNS) is popular in high performance arithmetic
applications such as digital signal processing systems because of
its inherent carry-free operations, parallelism and fault-tolerance
properties [1]. By decomposing large binary numbers into
smaller residues, addition and subtraction in RNS arithmetic have
no inter-digit carries or borrows, and multiplication can be
performed without the need to generate partial products.
A typical RNS consists of three parts: the binary-to-residue
(B/R) converter which converts the binary data into their residue
representations, the residue arithmetic units (RAUs) which
perform the necessary arithmetic operations required by the
applications, and the residue-to-binary (R/B) converter, which
converts the RNS-represented results into their weighted
representations. The efficiency of each part largely depends on
the moduli set. The architectures of the RNS for general moduli
sets have been widely studied. Because of the lack of special
number theoretic properties of the general moduli sets, the R/B
converters and RAUs for the general moduli set RNS are usually
area-consuming [2] or implemented based on lookup tables
(LUTs) [3]. Although the cost of memory has been driven low
nowadays, the number of LUTs and the access time incurred by
the need to read these LUTs iteratively have made the
implementations inefficient for ASIC realization for RNS with
large dynamic ranges.
Special moduli sets have been used extensively to reduce
the hardware complexity in the implementation of RNS,
especially for R/B converters [414]. The moduli sets in the form
of 2nf1 are most important, not only because of their efficient

;,(((

B/R and R/B converters, but also due to the existence of efficient
modular adders and modular multipliers for their RAUs.
There are some efficient R/B converters for the 3-moduli
set [47], but the granularity of its dynamic range and parallelism
is limited and insufficient for the contemporary high performance
and fault-tolerant applications. Moduli sets obtained by
extending the popular 3-moduli set through the addition of
moduli in the form of 2n f 1 are called supersets. The 4-moduli
superset {2n  1, 2n, 2n + 1, 2n1  1} was proposed by Bhardwaj
et al. [8], but two of the moduli are in the form of 2n + 1, causing
some excess of the dynamic range being unused. Vinod and
Premkumar proposed a more efficient 4-moduli superset {2n  1,
2n, 2n + 1, 2n1  1}[9] and its R/B converters are improved by
Cao et al. [10] using the efficient R/B conversion algorithm for
the 3-moduli set {2n  1, 2n, 2n + 1}. Skavantzos and Abdallah
proposed a class of conjugate moduli sets [11]. Although it is a
high-cardinality set, the moduli are not pairwise relatively prime.
Consequently, its dynamic range is reduced, the moduli are
unbalanced and the conversion delay is long. Skavantzos also
proposed a 5-moduli set {2n+1, 2n  1, 2n + 1, 2n + 2(n1)/2 + 1, 2n 
2(n1)/2 + 1}, valid for odd number n [12]. As the two extended
moduli are not in the form of 2n f 1, the B/R and R/B
converters and the RAUs will be less efficient when compared to
those of supersets.
In this paper, we propose a new 5-moduli superset {2n  1,
n
n
2 , 2 + 1, 2 n1  1, 2n1  1}, which is valid for even values of n.
Our proposed algorithm for its R/B converter is based on the
mixed-radix conversion (MRC) technique and the R/B
conversion algorithm for the 4-moduli superset [10] wherein an
efficient R/B converter for the popular 3-moduli set proposed in
either [6] or [7] has been used. The derived R/B converter is
based on full adders (FAs). Such FA-based design has the
advantage of being design automation friendly and can be readily
pipelined to suit the throughput rate constrained by the
application.

2. BACKGROUND
In an RNS, an integer X can be represented by an n-tuple of
residues, (x1, x2, , xn) defined over a set of pairwise relatively
prime moduli S = {P1, P2, , Pn}, where gcd(Pi, Pj) = 1 for 1 d i,
j d n, and i z j. The set S is called the moduli set, while the
dynamic range of the system is defined by the product M of all
the moduli. Any integer X belonging to the ring ZM = {0, 1, 2, ,
M 1} has a unique RNS representation. The decomposition of
the binary number X into an n-tuple of residues, is called the B/R
conversion. The reverse process of combining all the residues

,,

,6&$6

into a single binary number is called the R/B conversion. The


R/B conversion can be performed using the Chinese Remainder
Theorem (CRT) [1] as follows:
n

M i 1

where M

M i

Pi ,

(1)

M i

Pi

i 1

M Pi , and

M i 1

denotes the
Pi

multiplicative inverse of M i modulo Pi.

MRC technique can be also used for the calculation of


X [3]. For a simple 2-moduli set {P1, P2}, the integer X can
be converted from its residue representation (x1, x2) by
(2)
X x1  P1 x2  x1 P11
P2 P
2

If the integers X and Y have RNS representations (x1, x2, ,


xn) and (y1, y2, , yn) respectively, then the RNS representation
of Z = X qY is given by (z1, z2, , zn), where zi xi $ yi , and
Pi

the operation denotes either addition, subtraction or


multiplication [1]. This means that arithmetic operation on large
numbers can be performed by a collection of smaller arithmetic
operations, and these operations can be executed independently
and in parallel without inter-channel carry propagation.

3. REVERSE CONVERSION ALGORITHM


The following lemma and properties are necessary for our
proposed algorithm.
Lemma 1 For any even number n (n > 2), the solution of the
modular equation
18 k 0

(3)

2 n 1 1

is given by:
when n = 6k 2, k = 2, 3, 4, ,

2
 2

26k 5 

  2

6i  4

i 0
6 k  2  4

 2 6i 1

2

6 1  1

2 n  3  2 4  2 6 1  4

  2

6 k  2  1

(4)

2 6k 2 

k 1

2 6i  2

i 0

k 2

1

2 6i  5

2 n  2  2 2  2 61 2

i 0

   2 6 k 1  2  2 0  2 5  2 6 1 5    2 6 k  2  5
when n = 6k + 2, k =1, 2, 3, ,
k

k0

 2

i 1
6 1  3

6i

 2 6i  3

2 6 1  2 6 2    2 6 k

 2 62  3    2 6k  3

(5)


(6)

Due to the page constraint, only the proof of case (4) is


provided. The other cases can be proven similarly.
Proof of (4): When n = 6k 2, the left-hand side of (3) becomes:

18 2 6 k 5  2 6 k 6  1
9

2n 1 1

5 2

6 k 3

1 1

2 n 1 1

5 2 6 k 3  4

2 1

1

2 1

where x is 1s complement of n-bit binary number x.


Properties (7) and (8) can be utilized to completely
eliminate the logic circuits needed to implement the modulo 2n
1 multiplication. Only rewiring of bits is required which incurs
virtually no cost of area and delay.
The R/B conversion algorithm consists of two steps. In the
first step, one 4-moduli R/B converter is required to obtain the
interim integer X(2) = (x1, x2, x3, x4) with respect to the moduli set
{2n  1, 2n, 2n + 1, 2n1  1} [10]. Then, MRC equation (2) can be
used to calculate the final X = (X(2), x5) corresponding to the 2moduli set {2n(22n  1)(2n+1  1), 2n1  1}.
Let the binary representations of the residues be x1 =
(X1,n1X1,n2 X1,0)2, x2 = (X2,n1X2,n2 X2,0)2, x3 = (X3,nX3,n1
X3,0)2, x4 = (X4,nX4,n X4,0)2, and x5 = (X5,nX5,n X5,0)2.
According to the conversion algorithm [10], the interim integer
X(2) = (x1, x2, x3, x4) can be calculated by
(9)
X 2 x 2  2 n 2 2 n Z  2 n Y2  Y1  Z x 2  2 n T ,
where Y1, Y2 and Z are the interim values of the R/B converter for
the 4-moduli superset, and T is a (3n+1)-bit number.
Let the binary representations of Y1, Y2 and Z be Y1 =
(Y1,n1Y1,n Y1,0)2, Y2 = (Y1,n1Y2,n Y2,0)2, Z = (ZnZn1
Z0)2. According to the MRC Equation (2), X can be calculated
from the 2-moduli set {2n(22n  1)(2n+1  1), 2n1  1} as follows:
(10)
X X 2  2 n 2 2 n  1 2 n 1  1 k 0 x5  X 2 n 1

1

where k0 is the multiplicative inverse of 2 (22n  1)(2n+1  1)


modulo 2n1  1, i.e.,
(11)
k 0 2 n 2 2n  1 2 n1  1 n1
1

1

The term on the left-hand side of (11) can be simplified to

k 0 2 n 2 2 n  1 2 n 1  1

2n 1 1

x 5  X 2

k 0 x 5  X 2

and

k 0 2 3 3 2n 1 1

18k 0

2n 1 1

One useful property, which is iteratively used in [10], is


stated below for ease of reference:

2n 1 1

k 0 L 2n 1 1 ,

(13)

then (10) can be expressed as:


X X 2  2 n 2 2 n  1 2 n 1  1 R

(14)
By substituting the expression of X(2) given by (9) into (12),
L can be calculated as follows:
L

x5  x 2  2 n T

x5  x 2  2 2 2n  1 Z  2 n Y2  Y1

2 n 1 1

2 n 1 1

x5  x 2  6 Z  4Y2  2Y1 2n 1 1
Y2d

(12)

2 n 1 1

Now, let x2d


n 1

(7)

,
so that the solution of (11) can be obtained by Lemma 1.
To simplify (10), let

when n = 6k, k =1, 2, 3, ,


k0

CLS x, r ,

2n 1

where the function CLS(x, r) is used to denote a circular shift of


the n-bit binary number x by r bits to the left. If the multiplicand
is negative, then Property (7) can be expressed as follows:
(8)
 x 2r n
x 2r n
CLS x , r ,

k 2

k0

x 2r

x2  2n1 X 2,n1 ,
d

Y2  2 n1Y2,n1 and Y1

Zd

Z  2n Z n  2n 1 Z n1 ,

Y1  2 n 1Y1,n1 . By introducing these

terms into the expression of L, and with the aid of Properties (7)
and (8), L can be simplified as follows:

,,

When n = 6k + 2,

13

Li

i 1

2n 1 1

L  CLS L,6 1  CLS L,6 2  

(15)
R

where
L1

X 5, n  2 X 5, n  3  X 5,1 X 5,0

L2

X 2, n  2 X 2, n  3  X 2,1 X 2,0

(16b)

L3

Z n  3  Z1Z 0 Z n  2

(16c)

L4

Z n  4  Z 0 Z n  2 Z n 3

(16d)

L5

Y2, n  4  Y2,0Y2, n  2Y2, n  3

(16e)

L6

Y1, n  3 Y1,1Y1,0Y1, n  2

(16f)

L7

1 n  2 X 2, n 1

1 n  5 Z n 1 3

(16g)

L8

(16h)

L9

1 n  4 Z n 1 2

(16i)

L10

1 n  4 Z n 1 1 2

L11

1 n  3 Z n 1 1 1

L12

1 n  4 Y2, n 1 1 2

L13

1 n 3 Y

1, n 1

2 n 1 1

Only one (8, 2n1 1) Multi-operand modular adder


(MOMA) [13] is required to implement (17). Fig. 1 shows the
architecture for the calculation of L.
L2

L3

(n1)-bit CSA1
c
s

L4

L5

L6

cM sM

 1 2 n 1  1 R ,

(21)

2 n 1 2 2 n R  2 n 1 R  R  R

2 n 1 U  R

2 2 n R  2 n 1 R  R

Rn 2 Rn 3  R1 R0 0 2n  Rn 2 Rn 3  R1 R0 2

(22)

(23)

In (23), (3n1)-bit U can be easily computed with only a


(3n1)-bit binary subtractor. The result can be concatenated to R
to form the 4n-bit vector V by (22). Then, by substituting the
computed values of X(2) from (9) and V from (21) into (14), we
have
X 2  2 n V

x 2  2 n T  V

(24)
A 4n-bit binary adder is required to sum the values of T and
V in (24) and the resultant sum is concatenated to the residue x2
to obtain X. Fig. 3 shows the final architecture of the R/B
converter for the proposed 5-moduli superset.
X

Residue-to-binary
converter
for 4-moduli set

x2

x5

x1 x2 x3 x4
Residue-to-binary
converter
for 3-moduli set
Y 2 Y1
Calculation of Z

Calculation of R

n 1

Fig. 1 Calculation of L

Fig. 2 Calculation of R

4n-bit Adder
T+V

Fig. 3 Architecture of the proposed residue-to-binary converter


2 n 1 1

(18)

4. PERFORMANCE COMPARISONS

When n = 6k,
CLS L, n  2  CLS L,2  CLS L,6 1  2  


CLS L ,6 1  5    CLS L ,6 k  2  5

 CLS L,6 k  2  4  CLS L ,1  CLS L ,6 1  1 


CLS L ,6 2  1    CLS L ,6 k  2  1

3n+1

(3n-1)-bit SUB

CLS L, n  3  CLS L,4  CLS L,6 1  4  

(3n+1)-bit SUB

n-1
R

Once L has been obtained, R can be calculated by (13)


according to the value of n due to the different closed form
expressions of k0 obtained by Lemma 1.
When n = 6k 2,

(20)

1

(n 1)-bit 1's complement adder

(n1)-bit CPA with EAC

n 1

where U is defined as follows:

CSA with EAC


c s

(n1)-bit CSA 6
cL sL

(n1)

2n

CSA with EAC


c s

(n1)-bit CSA 5
c s

By expanding the right hand side of V, we have

CSA with EAC


c s

(n1)-bit CSA 4
c
s

(n1)-bit CSA 3
c
s

CLS(L,13) CLS(L,10) CLS(L,4) CLS(L,7) CLS(L,1)

(n1)-bit CSA 2
c
s

One (2k 1, 2 1) MOMA is required to implement (18)


whereas for (19) and (20), one (2k, 2n1 1) MOMA is required.
Fig. 2 shows the architecture for the calculation of R when k = 3
and n = 6k 2 = 16, where only one (5, 215 1) MOMA is
required.
The final value of X can be calculated by (14) after the
(n1)-bit R is obtained. Let the binary representation of R be
(Rn2Rn R0)2. First, let

1 1

n1

(16l)

i 1

L1

(16k)

(16m)
As there are some embedded constant strings of 1s in L7 to
L13, the modular summation M, of L7 to L13 can be simplified
substantially. After the simplification, L can be calculated by (17),
where CM and SM are the carry and sum of M, respectively.
6
(17)
L
L  2uC  S

CLS L ,6 2  3    CLS L ,6 k  3

(16a)

(16j)

 CLS L,6 k  1  CLS L ,6 1  3 

(19)

 CLS L,6 k  1  2  CLS L ,0  CLS L ,5 


2 n 1 1

We compare our R/B converter with the closest and most


efficient R/B converter recently reported for the special fivemoduli set based on {2n+1, 2n  1, 2n + 1, 2n + 2(n1)/2 + 1, 2n 
2(n1)/2 + 1}[12]. This special 5-moduli set RNS is valid for odd n.

,,

The performances are evaluated by simulating the prototyped


architectures modeled in structural VHDL codes. The technology
cells used for the synthesis come from the Avant!s LibraPassport 0.35Pm V2.6 libraries. The operation voltage is 3.3V.
The synthesis tool used is the Design Compiler (DC) of
Synopsys Inc, and the simulation results of area, delay and power
are extracted from the reports of DC.
The R/B converter architecture of [12] consists of two parts.
The first part is the calculation of the interim terms A, B, C and D,
where 4 modular multiplications with 4 different constants are
required. The second part consists of one (12, 24n 1)-MOMA.
As the implementation of the modular multipliers in the first part
is not given in [12], for fair comparison, the calculation of A, B,
C and D are also simplified and optimized. The pre-layout
simulation results of both architectures in terms of the area, delay
and power for different dynamic ranges are compared in Fig. 4 to
6. The results show that our proposed converter outperforms that
of [12] for the same dynamic range for all metrics in comparison.
The main reason for the inferior performance of the converter of
[12] is that the hardware resources required by the modular
multipliers cannot be eliminated. Worse still, they escalate with
increasing dynamic range thereby deteriorates its overall
performance severely.
4

12

x 10

240

[12]
Proposed

11

9
Are 8
a
(NA 7
ND
s) 6
5

100

80

60

40

1
0

[12]
Proposed

Tot 220
al
Co 200
nve
180
rsio
n 160
Del
ay 140
(ns)
120

10

20
0

10

20

30

40

50 60 70 80 90 100 110 120 130 140 150


Dynamic Range (Bits)

Fig. 4 Area comparison

10

20

30

40

50

60 70 80 90 100 110 120 130 140 150


Dynamic Range (Bits)

Fig. 5 Delay comparison

700
[12]
Proposed
600

500
Po
wer
(m 400
W)
300

200

100

10

20

30

40

50

60 70 80 90 100 110 120 130 140 150


Dynamic Range (Bits)

Fig. 6 Power consumption comparison

5. CONCLUSION
In this paper, a new 5-moduli superset {2n  1, 2n, 2n + 1, 2n+1  1,
2n1  1} RNS has been proposed, which is valid for even n. The
new R/B converter incorporates the R/B conversion algorithm of
the 4-moduli superset {2n  1, 2n, 2n + 1, 2n+1  1}, which is
based on most efficient algorithm cited in the literature for the
popular three moduli set. Being a fundamentally FA-based design,
the proposed architecture of the R/B converter can be easily
pipelined to achieve high throughput rate and is more versatile to
optimization by silicon compilers for different dynamic ranges.
Performance comparisons show that the proposed R/B converter
achieves better performances in area, delay and power

,,

consumption than the existing advanced R/B converter of 5moduli set RNS [12]. The proposed 5-moduli set will also be
more efficient for the B/R conversion and the RAUs than its
counterpart as all moduli are in the form of 2n 1.

6. REFERENCES
[1] M. A. Soderstrand, W. K. Jenkins, G. A. Jullien and F. J.
Taylor, Residue Number System Arithmetic: Modern
Applications in Digital Signal Processing. New York: IEEE
Press, 1986.
[2] R. M. Capocelli and R. Giancarlo, Efficient VLSI networks
for converting an integer from binary system to residue number
system and vice versa, IEEE Trans. Circuits Syst., vol. 35, no.
11. pp. 1425-1430, 1988.
[3] C. H. Huang, A fully parallel mixed radix conversion
algorithm for residue number applications, IEEE Trans.
Comput., vol. 32, no. 4, pp. 398-402, 1983.
[4] S. Andraos and H. Ahmad, A new efficient memoryless
residue to binary converter, IEEE Trans. Circuits Syst., vol. 35,
no. 11, pp. 1441-1444, 1988.
[5] A. A. Hiasat and H. S. Abdel-Aty-Zohdy, Residue-tobinary arithmetic converter for the moduli set (2k, 2k 1, 2k1
1), IEEE Trans. Circuits Syst. -II, vol. 45, no. 2, pp. 204-209,
Feb. 1998.
[6] Z. Wang, G. A. Jullien and W. C. Miller, An improved
residue-to-binary converter, IEEE Trans. Circuits Syst. -I, vol.
47, no. 9, pp. 1437-1440, Sep. 2000.
[7] Y. Wang, X. Song, M. Aboulhamid and H. Shen, Adder
based residue to binary number converters for (2n  1, 2n, 2n +
1), IEEE Trans. Signal Processing, vol. 50, no. 7, pp. 17721779, 2002.
[8] M. Bhardwaj, T. Srikanthan and C. T. Clarke, A reverse
converter for the 4-moduli superset {2n  1, 2n, 2n + 1, 2n+1 + 1},
in Proc. of 14th IEEE Symp. on Computer Arithmetic, Adelaide,
Australia, pp. 168-175, Apr., 1999.
[9] A. P. Vinod and A. B. Premkumar, A memoryless reverse
converter for the 4-moduli superset {2n  1, 2n, 2n + 1, 2n+1  1},
J. of Circuits, Systems, and Computers, vol. 10, no. 1&2, pp. 8599, 2000.
[10] B. Cao, C. H. Chang, and T. Srikanthan, New efficient
residue-to-binary converters for 4-moduli set {2n  1, 2n, 2n + 1,
2n+1  1}, in Proc. of IEEE Symp. On Circuits and Systems
(ISCAS-2003), vol. 4, pp. 536-539, May 2003.
[11] A. Skavantzos and M. Abdallah, Implementation issues of
the two-level residue number system with pairs of conjugate
moduli, IEEE Trans. Signal Processing, vol. 47, no. 3, pp. 826838, Mar. 1999.
[12] A. Skavantzos, An efficient residue to weighted converter
for a new residue number system, in Proc. of the 8th Great
Lakes Symp. VLSI, LA, no. 9, pp. 185-191, Feb. 1998.
[13] S. J. Piestrak, A high speed realization of residue to binary
number system converter. IEEE Trans. Circuits Syst. -II , vol.
42, no. 10, pp. 661-663, 1995.
[14] B. Cao, C. H. Chang and T. Srikanthan, An efficient
reverse converter for the 4-moduli set {2n1, 2n, 2n+1, 22n1}
based on the New Chinese Remainder Theorem, IEEE Trans.
Circuits Syst.I, vol. 50, no. 10, pp. 1296-1303, October 2003.

You might also like