Design of Residue-To-Binary Converter For A New 5-Moduli Superset Residue Number System

DESIGN OF RESIDUE-TO-BINARY CONVERTER FOR
A NEW 5-MODULI SUPERSET RESIDUE NUMBER SYSTEM

Bin Cao, Thambipillai Srikanthan and Chip-Hong Chang
Centre for High Performance Embedded Systems, Nanyang Technological University,
N4-B3b-06, Nanyang Avenue, Singapore 639798
ABSTRACT
This paper presents an efficient residue-to-binary (R/B)
conversion algorithm for a new 5-moduli superset {2n 1, 2n, 2n
+ 1, 2n1 1, 2n1 1} residue number system (RNS) when n is
even. The new moduli set is provided for larger dynamic range
and higher parallelism. Our R/B conversion algorithm is based
on a 4-moduli set R/B converter and the mixed-radix conversion
(MRC) technique. The proposed architecture is built around full
adders, which can be easily pipelined to achieve high throughput
rate. Our investigations show that the resulting architecture is
notably more efficient than that proposed for an existing 5moduli set RNS in terms of area, delay and power consumption.
1. INTRODUCTION
The intense drive for low power-delay product has been at the
forefront of the escalated development for the leading-edge VLSI
and Giga-scale-integration (GSI) circuits. Residue Number
System (RNS) is popular in high performance arithmetic
applications such as digital signal processing systems because of
its inherent carry-free operations, parallelism and fault-tolerance
properties [1]. By decomposing large binary numbers into
smaller residues, addition and subtraction in RNS arithmetic have
no inter-digit carries or borrows, and multiplication can be
performed without the need to generate partial products.
A typical RNS consists of three parts: the binary-to-residue
(B/R) converter which converts the binary data into their residue
representations, the residue arithmetic units (RAUs) which
perform the necessary arithmetic operations required by the
applications, and the residue-to-binary (R/B) converter, which
converts the RNS-represented results into their weighted
representations. The efficiency of each part largely depends on
the moduli set. The architectures of the RNS for general moduli
sets have been widely studied. Because of the lack of special
number theoretic properties of the general moduli sets, the R/B
converters and RAUs for the general moduli set RNS are usually
area-consuming [2] or implemented based on lookup tables
(LUTs) [3]. Although the cost of memory has been driven low
nowadays, the number of LUTs and the access time incurred by
the need to read these LUTs iteratively have made the
implementations inefficient for ASIC realization for RNS with
large dynamic ranges.
Special moduli sets have been used extensively to reduce
the hardware complexity in the implementation of RNS,
especially for R/B converters [414]. The moduli sets in the form
of 2nf1 are most important, not only because of their efficient
;,(((
B/R and R/B converters, but also due to the existence of efficient
modular adders and modular multipliers for their RAUs.
There are some efficient R/B converters for the 3-moduli
set [47], but the granularity of its dynamic range and parallelism
is limited and insufficient for the contemporary high performance
and fault-tolerant applications. Moduli sets obtained by
extending the popular 3-moduli set through the addition of
moduli in the form of 2n f 1 are called supersets. The 4-moduli
superset {2n 1, 2n, 2n + 1, 2n1 1} was proposed by Bhardwaj
et al. [8], but two of the moduli are in the form of 2n + 1, causing
some excess of the dynamic range being unused. Vinod and
Premkumar proposed a more efficient 4-moduli superset {2n 1,
2n, 2n + 1, 2n1 1}[9] and its R/B converters are improved by
Cao et al. [10] using the efficient R/B conversion algorithm for
the 3-moduli set {2n 1, 2n, 2n + 1}. Skavantzos and Abdallah
proposed a class of conjugate moduli sets [11]. Although it is a
high-cardinality set, the moduli are not pairwise relatively prime.
Consequently, its dynamic range is reduced, the moduli are
unbalanced and the conversion delay is long. Skavantzos also
proposed a 5-moduli set {2n+1, 2n 1, 2n + 1, 2n + 2(n1)/2 + 1, 2n
2(n1)/2 + 1}, valid for odd number n [12]. As the two extended
moduli are not in the form of 2n f 1, the B/R and R/B
converters and the RAUs will be less efficient when compared to
those of supersets.
In this paper, we propose a new 5-moduli superset {2n 1,
n
n
2 , 2 + 1, 2 n1 1, 2n1 1}, which is valid for even values of n.
Our proposed algorithm for its R/B converter is based on the
mixed-radix conversion (MRC) technique and the R/B
conversion algorithm for the 4-moduli superset [10] wherein an
efficient R/B converter for the popular 3-moduli set proposed in
either [6] or [7] has been used. The derived R/B converter is
based on full adders (FAs). Such FA-based design has the
advantage of being design automation friendly and can be readily
pipelined to suit the throughput rate constrained by the
application.
2. BACKGROUND
In an RNS, an integer X can be represented by an n-tuple of
residues, (x1, x2, , xn) defined over a set of pairwise relatively
prime moduli S = {P1, P2, , Pn}, where gcd(Pi, Pj) = 1 for 1 d i,
j d n, and i z j. The set S is called the moduli set, while the
dynamic range of the system is defined by the product M of all
the moduli. Any integer X belonging to the ring ZM = {0, 1, 2, ,
M 1} has a unique RNS representation. The decomposition of
the binary number X into an n-tuple of residues, is called the B/R
conversion. The reverse process of combining all the residues
,,
,6&$6
into a single binary number is called the R/B conversion. The

R/B conversion can be performed using the Chinese Remainder
Theorem (CRT) [1] as follows:
n
M i 1
where M
M i
Pi ,
(1)
M i
Pi
i 1
M Pi , and
M i 1
denotes the
Pi
multiplicative inverse of M i modulo Pi.
MRC technique can be also used for the calculation of

X [3]. For a simple 2-moduli set {P1, P2}, the integer X can
be converted from its residue representation (x1, x2) by
(2)
X x1 P1 x2 x1 P11
P2 P
2
If the integers X and Y have RNS representations (x1, x2, ,

xn) and (y1, y2, , yn) respectively, then the RNS representation
of Z = X qY is given by (z1, z2, , zn), where zi xi $ yi , and
Pi
the operation denotes either addition, subtraction or

multiplication [1]. This means that arithmetic operation on large
numbers can be performed by a collection of smaller arithmetic
operations, and these operations can be executed independently
and in parallel without inter-channel carry propagation.
3. REVERSE CONVERSION ALGORITHM

The following lemma and properties are necessary for our
proposed algorithm.
Lemma 1 For any even number n (n > 2), the solution of the
modular equation
18 k 0
(3)
2 n 1 1
is given by:
when n = 6k 2, k = 2, 3, 4, ,
2
2
26k 5
2
6i 4
i 0
6 k 2 4
2 6i 1
2
6 1 1
2 n 3 2 4 2 6 1 4
2
6 k 2 1
(4)
2 6k 2
k 1
2 6i 2
i 0
k 2
1
2 6i 5
2 n 2 2 2 2 61 2
i 0
2 6k 1 2 2 0 2 5 2 6 1 5 2 6 k 2 5
when n = 6k + 2, k =1, 2, 3, ,
k
k0
2
i 1
6 1 3
6i
2 6i 3
2 6 1 2 6 2 2 6 k
2 62 3 2 6k 3
(5)

(6)
Due to the page constraint, only the proof of case (4) is

provided. The other cases can be proven similarly.
Proof of (4): When n = 6k 2, the left-hand side of (3) becomes:
18 2 6 k 5 2 6 k 6 1
9
2n 1 1
5 2
6 k 3
1 1
2 n 1 1
5 2 6 k 3 4
2 1
1
2 1
where x is 1s complement of n-bit binary number x.

Properties (7) and (8) can be utilized to completely
eliminate the logic circuits needed to implement the modulo 2n
1 multiplication. Only rewiring of bits is required which incurs
virtually no cost of area and delay.
The R/B conversion algorithm consists of two steps. In the
first step, one 4-moduli R/B converter is required to obtain the
interim integer X(2) = (x1, x2, x3, x4) with respect to the moduli set
{2n 1, 2n, 2n + 1, 2n1 1} [10]. Then, MRC equation (2) can be
used to calculate the final X = (X(2), x5) corresponding to the 2moduli set {2n(22n 1)(2n+1 1), 2n1 1}.
Let the binary representations of the residues be x1 =
(X1,n1X1,n2 X1,0)2, x2 = (X2,n1X2,n2 X2,0)2, x3 = (X3,nX3,n1
X3,0)2, x4 = (X4,nX4,n X4,0)2, and x5 = (X5,nX5,n X5,0)2.
According to the conversion algorithm [10], the interim integer
X(2) = (x1, x2, x3, x4) can be calculated by
(9)
X 2 x 2 2 n 2 2 n Z 2 n Y2 Y1 Z x 2 2 n T ,
where Y1, Y2 and Z are the interim values of the R/B converter for
the 4-moduli superset, and T is a (3n+1)-bit number.
Let the binary representations of Y1, Y2 and Z be Y1 =
(Y1,n1Y1,n Y1,0)2, Y2 = (Y1,n1Y2,n Y2,0)2, Z = (ZnZn1
Z0)2. According to the MRC Equation (2), X can be calculated
from the 2-moduli set {2n(22n 1)(2n+1 1), 2n1 1} as follows:
(10)
X X 2 2 n 2 2 n 1 2 n 1 1 k 0 x5 X 2 n 1
1
where k0 is the multiplicative inverse of 2 (22n 1)(2n+1 1)

modulo 2n1 1, i.e.,
(11)
k 0 2 n 2 2n 1 2 n1 1 n1
1
1
The term on the left-hand side of (11) can be simplified to
k 0 2 n 2 2 n 1 2 n 1 1
2n 1 1
x 5 X 2
k 0 x 5 X 2
and
k 0 2 3 3 2n 1 1
18k 0
2n 1 1
One useful property, which is iteratively used in [10], is

stated below for ease of reference:
2n 1 1
k 0 L 2n 1 1 ,
(13)
then (10) can be expressed as:

X X 2 2 n 2 2 n 1 2 n 1 1 R
(14)
By substituting the expression of X(2) given by (9) into (12),
L can be calculated as follows:
L
x5 x 2 2 n T
x5 x 2 2 2 2n 1 Z 2 n Y2 Y1
2 n 1 1
2 n 1 1
x5 x 2 6 Z 4Y2 2Y1 2n 1 1
Y2d
(12)
2 n 1 1
Now, let x2d

n 1
(7)
,
so that the solution of (11) can be obtained by Lemma 1.
To simplify (10), let
when n = 6k, k =1, 2, 3, ,

k0
CLS x, r ,
2n 1
where the function CLS(x, r) is used to denote a circular shift of

the n-bit binary number x by r bits to the left. If the multiplicand
is negative, then Property (7) can be expressed as follows:
(8)
x 2r n
x 2r n
CLS x , r ,
k 2
k0
x 2r
x2 2n1 X 2,n1 ,
d
Y2 2 n1Y2,n1 and Y1
Zd
Z 2n Z n 2n 1 Z n1 ,
Y1 2 n 1Y1,n1 . By introducing these
terms into the expression of L, and with the aid of Properties (7)
and (8), L can be simplified as follows:
,,
When n = 6k + 2,
13
Li
i 1
2n 1 1
L CLS L,6 1 CLS L,6 2
(15)
R
where
L1
X 5, n 2 X 5, n 3 X 5,1 X 5,0
L2
X 2, n 2 X 2, n 3 X 2,1 X 2,0
(16b)
L3
Z n 3 Z1Z 0 Z n 2
(16c)
L4
Z n 4 Z 0 Z n 2 Z n 3
(16d)
L5
Y2, n 4 Y2,0Y2, n 2Y2, n 3
(16e)
L6
Y1, n 3 Y1,1Y1,0Y1, n 2
(16f)
L7
1n 2 X 2, n 1
1n 5 Z n 13
(16g)
L8
(16h)
L9
1n 4 Z n 12
(16i)
L10
1n 4 Z n 1 12
L11
1n 3 Z n 1 11
L12
1n 4Y2, n 1 12
L13
1n 3Y
1, n 1
2 n 1 1
Only one (8, 2n1 1) Multi-operand modular adder

(MOMA) [13] is required to implement (17). Fig. 1 shows the
architecture for the calculation of L.
L2
L3
(n1)-bit CSA1
c
s
L4
L5
L6
cM sM
1 2 n 1 1 R ,
(21)
2 n 1 2 2 n R 2 n 1 R R R
2 n 1 U R
2 2 n R 2 n 1 R R
Rn 2 Rn 3 R1 R0 02n Rn 2 Rn 3 R1 R0 2
(22)
(23)
In (23), (3n1)-bit U can be easily computed with only a

(3n1)-bit binary subtractor. The result can be concatenated to R
to form the 4n-bit vector V by (22). Then, by substituting the
computed values of X(2) from (9) and V from (21) into (14), we
have
X 2 2 n V
x 2 2 n T V
(24)
A 4n-bit binary adder is required to sum the values of T and
V in (24) and the resultant sum is concatenated to the residue x2
to obtain X. Fig. 3 shows the final architecture of the R/B
converter for the proposed 5-moduli superset.
X
Residue-to-binary
converter
for 4-moduli set
x2
x5
x1 x2 x3 x4
Residue-to-binary
converter
for 3-moduli set
Y 2 Y1
Calculation of Z
Calculation of R
n 1
Fig. 1 Calculation of L
Fig. 2 Calculation of R
4n-bit Adder
T+V
Fig. 3 Architecture of the proposed residue-to-binary converter

2 n 1 1
(18)
4. PERFORMANCE COMPARISONS
When n = 6k,
CLS L, n 2 CLS L,2 CLS L,6 1 2

CLS L ,6 1 5 CLS L ,6 k 2 5
CLS L,6k 2 4 CLS L ,1 CLS L ,6 1 1

CLS L ,6 2 1 CLS L ,6 k 2 1
3n+1
(3n-1)-bit SUB
CLS L, n 3 CLS L,4 CLS L,6 1 4
(3n+1)-bit SUB
n-1
R
Once L has been obtained, R can be calculated by (13)

according to the value of n due to the different closed form
expressions of k0 obtained by Lemma 1.
When n = 6k 2,
(20)
1
(n 1)-bit 1's complement adder
(n1)-bit CPA with EAC
n 1
where U is defined as follows:
CSA with EAC

c s
(n1)-bit CSA 6
cL sL
(n1)
2n
CSA with EAC

c s
(n1)-bit CSA 5
c s
By expanding the right hand side of V, we have
CSA with EAC

c s
(n1)-bit CSA 4
c
s
(n1)-bit CSA 3
c
s
CLS(L,13) CLS(L,10) CLS(L,4) CLS(L,7) CLS(L,1)
(n1)-bit CSA 2
c
s
One (2k 1, 2 1) MOMA is required to implement (18)

whereas for (19) and (20), one (2k, 2n1 1) MOMA is required.
Fig. 2 shows the architecture for the calculation of R when k = 3
and n = 6k 2 = 16, where only one (5, 215 1) MOMA is
required.
The final value of X can be calculated by (14) after the
(n1)-bit R is obtained. Let the binary representation of R be
(Rn2Rn R0)2. First, let
11
n1
(16l)
i 1
L1
(16k)
(16m)
As there are some embedded constant strings of 1s in L7 to
L13, the modular summation M, of L7 to L13 can be simplified
substantially. After the simplification, L can be calculated by (17),
where CM and SM are the carry and sum of M, respectively.
6
(17)
L
L 2uC S
CLS L ,6 2 3 CLS L ,6 k 3
(16a)
(16j)
CLS L,6 k 1 CLS L ,6 1 3
(19)
CLS L,6 k 1 2 CLS L ,0 CLS L ,5

2 n 1 1
We compare our R/B converter with the closest and most

efficient R/B converter recently reported for the special fivemoduli set based on {2n+1, 2n 1, 2n + 1, 2n + 2(n1)/2 + 1, 2n
2(n1)/2 + 1}[12]. This special 5-moduli set RNS is valid for odd n.
,,
The performances are evaluated by simulating the prototyped

architectures modeled in structural VHDL codes. The technology
cells used for the synthesis come from the Avant!s LibraPassport 0.35Pm V2.6 libraries. The operation voltage is 3.3V.
The synthesis tool used is the Design Compiler (DC) of
Synopsys Inc, and the simulation results of area, delay and power
are extracted from the reports of DC.
The R/B converter architecture of [12] consists of two parts.
The first part is the calculation of the interim terms A, B, C and D,
where 4 modular multiplications with 4 different constants are
required. The second part consists of one (12, 24n 1)-MOMA.
As the implementation of the modular multipliers in the first part
is not given in [12], for fair comparison, the calculation of A, B,
C and D are also simplified and optimized. The pre-layout
simulation results of both architectures in terms of the area, delay
and power for different dynamic ranges are compared in Fig. 4 to
6. The results show that our proposed converter outperforms that
of [12] for the same dynamic range for all metrics in comparison.
The main reason for the inferior performance of the converter of
[12] is that the hardware resources required by the modular
multipliers cannot be eliminated. Worse still, they escalate with
increasing dynamic range thereby deteriorates its overall
performance severely.
4
12
x 10
240
[12]
Proposed
11
9
Are 8
a
(NA 7
ND
s) 6
5
100
80
60
40
1
0
[12]
Proposed
Tot 220
al
Co 200
nve
180
rsio
n 160
Del
ay 140
(ns)
120
10
20
0
10
20
30
40
50 60 70 80 90 100 110 120 130 140 150

Dynamic Range (Bits)
Fig. 4 Area comparison
10
20
30
40
50
60 70 80 90 100 110 120 130 140 150

Fig. 5 Delay comparison
700
[12]
Proposed
600
500
Po
wer
(m 400
W)
300
200
100
10
20
30
40
50
60 70 80 90 100 110 120 130 140 150

Fig. 6 Power consumption comparison
5. CONCLUSION
In this paper, a new 5-moduli superset {2n 1, 2n, 2n + 1, 2n+1 1,
2n1 1} RNS has been proposed, which is valid for even n. The
new R/B converter incorporates the R/B conversion algorithm of
the 4-moduli superset {2n 1, 2n, 2n + 1, 2n+1 1}, which is
based on most efficient algorithm cited in the literature for the
popular three moduli set. Being a fundamentally FA-based design,
the proposed architecture of the R/B converter can be easily
pipelined to achieve high throughput rate and is more versatile to
optimization by silicon compilers for different dynamic ranges.
Performance comparisons show that the proposed R/B converter
achieves better performances in area, delay and power
,,
consumption than the existing advanced R/B converter of 5moduli set RNS [12]. The proposed 5-moduli set will also be
more efficient for the B/R conversion and the RAUs than its
counterpart as all moduli are in the form of 2n 1.
6. REFERENCES
[1] M. A. Soderstrand, W. K. Jenkins, G. A. Jullien and F. J.
Taylor, Residue Number System Arithmetic: Modern
Applications in Digital Signal Processing. New York: IEEE
Press, 1986.
[2] R. M. Capocelli and R. Giancarlo, Efficient VLSI networks
for converting an integer from binary system to residue number
system and vice versa, IEEE Trans. Circuits Syst., vol. 35, no.
11. pp. 1425-1430, 1988.
[3] C. H. Huang, A fully parallel mixed radix conversion
algorithm for residue number applications, IEEE Trans.
Comput., vol. 32, no. 4, pp. 398-402, 1983.
[4] S. Andraos and H. Ahmad, A new efficient memoryless
residue to binary converter, IEEE Trans. Circuits Syst., vol. 35,
no. 11, pp. 1441-1444, 1988.
[5] A. A. Hiasat and H. S. Abdel-Aty-Zohdy, Residue-tobinary arithmetic converter for the moduli set (2k, 2k 1, 2k1
1), IEEE Trans. Circuits Syst. -II, vol. 45, no. 2, pp. 204-209,
Feb. 1998.
[6] Z. Wang, G. A. Jullien and W. C. Miller, An improved
residue-to-binary converter, IEEE Trans. Circuits Syst. -I, vol.
47, no. 9, pp. 1437-1440, Sep. 2000.
[7] Y. Wang, X. Song, M. Aboulhamid and H. Shen, Adder
based residue to binary number converters for (2n 1, 2n, 2n +
1), IEEE Trans. Signal Processing, vol. 50, no. 7, pp. 17721779, 2002.
[8] M. Bhardwaj, T. Srikanthan and C. T. Clarke, A reverse
converter for the 4-moduli superset {2n 1, 2n, 2n + 1, 2n+1 + 1},
in Proc. of 14th IEEE Symp. on Computer Arithmetic, Adelaide,
Australia, pp. 168-175, Apr., 1999.
[9] A. P. Vinod and A. B. Premkumar, A memoryless reverse
converter for the 4-moduli superset {2n 1, 2n, 2n + 1, 2n+1 1},
J. of Circuits, Systems, and Computers, vol. 10, no. 1&2, pp. 8599, 2000.
[10] B. Cao, C. H. Chang, and T. Srikanthan, New efficient
residue-to-binary converters for 4-moduli set {2n 1, 2n, 2n + 1,
2n+1 1}, in Proc. of IEEE Symp. On Circuits and Systems
(ISCAS-2003), vol. 4, pp. 536-539, May 2003.
[11] A. Skavantzos and M. Abdallah, Implementation issues of
the two-level residue number system with pairs of conjugate
moduli, IEEE Trans. Signal Processing, vol. 47, no. 3, pp. 826838, Mar. 1999.
[12] A. Skavantzos, An efficient residue to weighted converter
for a new residue number system, in Proc. of the 8th Great
Lakes Symp. VLSI, LA, no. 9, pp. 185-191, Feb. 1998.
[13] S. J. Piestrak, A high speed realization of residue to binary
number system converter. IEEE Trans. Circuits Syst. -II , vol.
42, no. 10, pp. 661-663, 1995.
[14] B. Cao, C. H. Chang and T. Srikanthan, An efficient
reverse converter for the 4-moduli set {2n1, 2n, 2n+1, 22n1}
based on the New Chinese Remainder Theorem, IEEE Trans.
Circuits Syst.I, vol. 50, no. 10, pp. 1296-1303, October 2003.

Design of Residue-To-Binary Converter For A New 5-Moduli Superset Residue Number System

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Design of Residue-To-Binary Converter For A New 5-Moduli Superset Residue Number System

Uploaded by

Copyright:

Available Formats

DESIGN OF RESIDUE-TO-BINARY CONVERTER FOR

A NEW 5-MODULI SUPERSET RESIDUE NUMBER SYSTEM

into a single binary number is called the R/B conversion. The

multiplicative inverse of M i modulo Pi.

MRC technique can be also used for the calculation of

If the integers X and Y have RNS representations (x1, x2, ,

the operation denotes either addition, subtraction or

3. REVERSE CONVERSION ALGORITHM

Due to the page constraint, only the proof of case (4) is

where x is 1s complement of n-bit binary number x.

where k0 is the multiplicative inverse of 2 (22n  1)(2n+1  1)

The term on the left-hand side of (11) can be simplified to

One useful property, which is iteratively used in [10], is

then (10) can be expressed as:

Now, let x2d

when n = 6k, k =1, 2, 3, ,

where the function CLS(x, r) is used to denote a circular shift of

Y1  2 n 1Y1,n1 . By introducing these

L  CLS L,6 1  CLS L,6 2  

Y2, n  4  Y2,0Y2, n  2Y2, n  3

Only one (8, 2n1 1) Multi-operand modular adder

In (23), (3n1)-bit U can be easily computed with only a

Fig. 3 Architecture of the proposed residue-to-binary converter

 CLS L,6 k  2  4  CLS L ,1  CLS L ,6 1  1 

CLS L, n  3  CLS L,4  CLS L,6 1  4  

Once L has been obtained, R can be calculated by (13)

(n 1)-bit 1's complement adder

(n1)-bit CPA with EAC

where U is defined as follows:

CSA with EAC

CSA with EAC

By expanding the right hand side of V, we have

CSA with EAC

CLS(L,13) CLS(L,10) CLS(L,4) CLS(L,7) CLS(L,1)

One (2k 1, 2 1) MOMA is required to implement (18)

 CLS L,6 k  1  CLS L ,6 1  3 

 CLS L,6 k  1  2  CLS L ,0  CLS L ,5 

We compare our R/B converter with the closest and most

The performances are evaluated by simulating the prototyped

50 60 70 80 90 100 110 120 130 140 150

Fig. 4 Area comparison

60 70 80 90 100 110 120 130 140 150

Fig. 5 Delay comparison

60 70 80 90 100 110 120 130 140 150

Fig. 6 Power consumption comparison

You might also like

where k0 is the multiplicative inverse of 2 (22n 1)(2n+1 1)

Y1 2 n 1Y1,n1 . By introducing these

L CLS L,6 1 CLS L,6 2

Y2, n 4 Y2,0Y2, n 2Y2, n 3

Only one (8, 2n1 1) Multi-operand modular adder

In (23), (3n1)-bit U can be easily computed with only a

CLS L,6k 2 4 CLS L ,1 CLS L ,6 1 1

CLS L, n 3 CLS L,4 CLS L,6 1 4

(n 1)-bit 1's complement adder

(n1)-bit CPA with EAC

One (2k 1, 2 1) MOMA is required to implement (18)

CLS L,6 k 1 CLS L ,6 1 3

CLS L,6 k 1 2 CLS L ,0 CLS L ,5