06820253

Latin American Symposium On Circuits And Systems
Design of Elliptic Curve Cryptoprocessors over

GF(2163) on Koblitz Curves
Paulo Realpe-Muoz, Vladimir Trujillo-Olaya and Jaime Velasco-Medina
Bionanoelectronics Research Group1
Universidad del Valle. Escuela de Ingeniera Elctrica y Electrnica, Cali, Colombia

Abstract This paper presents the design of cryptoprocessors
using two multipliers over finite field GF(2163) with digit-level
processing. The arithmetic operations were implemented in
hardware using Gaussian Normal Bases (GNB) representation
and the scalar multiplication kP was performed on Koblitz
curves using window-WWNAF algorithm with w = 2, 4, 8 and 16.
The cryptoprocessors were designed using VHDL description,
synthesized on the Stratix-IV FPGA using Quartus II 12.0, and
verified using SignalTAP II and Matlab. The simulation results
show that the cryptoprocessors present a very good performance
using low area. In this case, the computation times for calculating
the scalar multiplication for w = 2, 4, 8 and 16 were 9.88, 7.37,
6.17 and 5.05 Ps.
Index Terms Elliptic curve cryptography, Gaussian normal
basis, digit-level multiplier, Koblitz curves.
I. INTRODUCTION
HE use of computer networks and the huge number of

users, have increased the need to achieve a high level
security for the transmission and storing of the information.
There are many applications that require privacy, integrity or
authentication of the information transmitted or stored. Elliptic
curve cryptosystem (ECC) appears as a solution to achieve the
integrity of the information, due to the fact that it has some
advantages with respect to other cryptosystems such as
smaller key sizes and lower band-width preserving the same
computational security. For example, 160-bit ECC has the
same level of security that 1024-bit RSA.
In this context, the performance of an ECC depends of the
efficient implementation of the scalar multiplication kP, which
is an operation of successive additions of a point, called base
point, along an elliptic curve to itself. In this case, there are
several algorithms to carry out this operation on specific
curves that can efficiently calculate kP using special
properties, e.g., algorithms for Koblitz curves or Anomalous
Binary Curves (ABC).
Several scalar multiplication algorithms have been proposed
and implemented in hardware [1]-[4]. In [1] is presented an
algorithm to generate a sparse and joint W-adic representation
for a pair of scalars and its application in double scalar
multiplication. In [2] is proposed a high-performance
architecture for scalar multiplication based on the
Montgomery ladder method and a pseudo-pipelined word1 This work was supported partially by Colciencias and Universidad del Valle
978-1-4799-2507-0/14/$31.00 2014 IEEE
serial finite field multiplier, with word size w over GF(2m). In

[3] is presented the hardware implementation of the elliptic
curve digital signature algorithm (ECDSA) over GF(2163) on
Koblitz curves. In [4] is implemented an FPGA-based
accelerator for elliptic curve cryptography on Koblitz curve by
using window methods.
In this paper, we present the hardware design of
cryptoprocessors over GF(2163) using GNB representation and
w-WNAF algorithm with w = 2, 4, 8 and 16 on Koblitz curves.
The main contribution of this work is the efficient hardware
implementation of cryptoprocessors based on the w-WNAF
algorithm by considering different values of window on
Koblitz curves, obtaining a higher performance than the other
ones reported in the literature. The designed cryptoprocessors
present the best trade-off between computation time and area,
and are the more suitable for hardware cryptosystems.
This paper is organized as follows: Section II describes a
brief mathematical summary of field finite and curve elliptic
arithmetic over GF(2m). Section III presents the hardware
design. Section IV presents the hardware verification and
synthesis results for the designed cryptoprocessors. Finally,
conclusions are presented in Section V.
II. MATHEMATICAL BACKGROUND
A. Field Arithmetic Operations
ANSI X9.62 [6] describes detailed specifications of ECC
protocols and allows GNB be used to represent finite field
elements [7]. The following arithmetic operations can be
performed over GF(2m) when using a GNB of type T.
1) Addition: If A = (a0a1a2am-1) and B = (b0b1b2bm-1)
are elements over GF(2m), then A + B = C = (c0c1c2cm-1),
where ci = (ai + bi) mod 2.
2) Squaring: Let A = (a0a1a2am-1) GF(2m), then
2
m 1
m 1
i
i 1
i
m1
(1)
ai E 2
am1E 2
ai E 2
i 0
i 0
i 0
due to Fermats Theorem; E E, then

(2)
A2 am1a0 a1...am2
in this case, squaring is a simple rotation of the vector
representation.
3) Multiplication: The multiplication C = AB is based on
the multiplication matrix presented in [8]. If A =
(a0a1a2am-1) and B = (b0b1b2bm-1) are elements over
A2

GF(2m) represented using GNB, then AB = C = (c0c1c2cm-1),
where the coefficient c0 is given by equation (3)
m 1
T
(3)
a0 b1 ai bR (i , j )
i 0
j
1
where , ,
denotes the (i, j)th element of matrix. To obtain
the ith coefficient of C, i.e., ci, one needs to add i mod m to
all indices in (3).
c0
B. Elliptic Curve Arithmetic

A non-supersingular elliptic curve E(Fq) is defined as a set
of points (x, y) GF(2m)GF(2m) that satisfies the affine
coordinates equation,
(4)
y 2 xy x3 ax2 b
where a and b Fq and are constants with b 0, together with
the point at infinity denoted by O.
Let E be an elliptic curve over GF(2m), let Q and P E be
two arbitrary elliptic points satisfying the equation (4), and let
k be an arbitrary positive integer. Then, the elliptic curve
scalar multiplication Q = kP is defined as
(5)
kP P P ... P

k times
In the group operations, the inversion is the arithmetic

operation most expensive over GF(2m), and this operation can
be avoided by means of a projective coordinate representation.
In this case, the inversion is avoided by using the finite field
multiplication.
A point P in projective coordinates is represented using three
coordinates X, Y and Z. In Lopez-Dahab (LD) projective
coordinates [9], the projective point (X : Y : Z) with Z 0
corresponds to the affine coordinates x = X/Z and y = Y/Z2.
Then, the equation (4) can be mapped from affine coordinates
to LD projective coordinates, and it is given as
Y 2 XYZ
X 3 Z aX 2 Z 2 bZ 4
(6)
The two group operations for the elliptic curve arithmetic in
projective and affine coordinates are computed as
1) Point doubling Q = 2P, where Q = (X3 : Y3 : Z3) and P =
(X1 : Y1 : Z1) in projective coordinates, can be performed using
4 finite field multiplications such as
X 12 Z12
Z3
Y3
X3
X 14 bZ14
bZ14 Z 3 X 3 aZ3 Y12 bZ14
(7)
2) Point addition Q + P, where Q = (X1 : Y1 : Z1) in

projective coordinates and P = (x2, y2) in affine coordinates,
can be performed using 8 finite field multiplications such as
y2 Z12 Y1
x 2 Z1 X 1
Z1 B
B 2 C aZ12
AC
X 3 X 2 Z3
Y3
E Z 3 F G
Z3
X3
A2 D E
x2 y2 Z
2
3
(8)
C. Koblitz Curves
Koblitz curves, also known as anomalous binary curves, are
elliptic curves defined over GF(2m). The main advantage of
these curves is that the scalar multiplication operation can be
performed without use point doubling operations.
The Koblitz curves are curves defined over GF(2m) by
(9)
Ea y 2 xy x3 ax 1
where a {0,1}, that is, curves E0 and E1
These curves present the following property: If P(x, y) is a
point on the curve Ea, then the point (x2, y2) is also a point on
Ea. Besides, they satisfy (x4, y4) + 2(x, y) = (x2, y2) for each
point (x, y) on Ea, where = (1)1a. In GF(2m), Frobenius
map is an endomorphism that raises every element to its
power of two, this is, : x x2 [10]. In this case, if the scalar
k is represented in NAF, then
l 1
i 0
kiW i for ki {0,1, 1}
(10)
Using the expression (11), the scalar multiplication can be

computed as
kP
l 1
i 0
kiW i ( P)
(11)
The result corresponds to the Hamming weight of NAF and

it is equal to the binary NAF representation, i.e., Hamming
weight (log2 k)/3, and the length of the adic representation
of k is approximately 2m which is twice the length of the
binary NAF representation. However, Solinas in [5] presents a
method which reduces the length of the adic representation
to m approximately. Then, Koblitz curves arithmetic is based
on the point addition and Frobenius map .
Algorithm 1: Window NAF scalar multiplication method for
Koblitz curves
Input: Window width w, integer , P GF(2m) of order n
Output: kP
1. Compute WU{G

2. Compute , for
3.
4. For i from l1 downto 0 do
4.1
4.2 if then
Let u be such that or
If then
Else
5. Return Q
An efficient scalar multiplication algorithm that uses

window-NAF (wWNAF) is presented in Algorithm 1. This
algorithm requires, on average, m 1 Frobenius maps and

m/(w + 1) point additions. Since the Frobenius maps can be
computed with free squarings in normal basis, the computation
of point addition determines the efficiency of scalar
multiplication. In this work, a Maple code is written to obtain
the expansion wNAF of the scalar k for w = 2, 4 and 8
generating 8-bit expansion coefficients and for w = 16
generating 16-bit expansion coefficients.
III. HARDWARE ARCHITECTURE DESIGN
The finite field multiplication is the operation more
important to perform the scalar multiplication, thus this

operation must be implemented efficiently in hardware in
order to achieve a very high performance. There are several
algorithms to perform the finite field multiplication [11-14].
In [11], Reyhani-Masoleh proposed a serial or parallel digitlevel multiplier with digit-size d, where 1 d m. Fig. 1
shows the digit-level GF(2m) multiplier for T = 4, where A =
(a0a1a2am-1), B = (b0b1b2bm-1) and C = AB = (c0c1c2cm-1)
are elements represented in GNB over GF(2m) and stored in
registers A, B and C, respectively.
d
In this case, the Frobenius map is performed for coordinates of

X, Y and Z independently.
The functional blocks that perform the finite field arithmetic
over GF(2163) of the cryptoprocessor for Koblitz curves are
shown in Fig. 3. The second FSM is designed using the data
dependence graph shown in Fig. 4 and Fig. 5, and it performs
the point addition in LD mixed coordinates and the point
doubling with b = 1 in LD projective coordinates,
D
D F
D E
d
0 >>
>>
>>
Mux
CS
Mux
Mux
Mux
Mult
Mux
Mux
U
Adder
GF(2163)
>>
r >>
(m+1)/2
CS
Mux
Mult
(m+1)/2
U
Mux
Mux
CS
BLOCK I
Mux
Mux
Mux
BLOCK II
BLOCK III
Fig. 3. Functional blocks of the finite field arithmetic
r >>
d-1 >>
d-1 >>
(m+1)/2
CS
Jc
CS
According to the Fig. 4, the latencies for point addition and

point doubling are 5M and M + 3 respectively, where M is the
latency for a finite field multiplication.
ctrl
Fig. 1. Digit-level Multiplier over GF(2m)
The main functional blocks of the multiplier are U, J, CS and

adder. The block U is formed by the blocks U1 and U2, and its
structure depends on type T of GBN with T t 2 and the matrix
R for the multiplication; the block J is a set of m two-input
AND gates; the block CS is a d-fold cyclic shift, and the adder
GF(2163) is a set of two-input XOR gates.
In order to implement in hardware the digit-level multiplier
with digit-size d = 55, that is M = 3 clock cycles, a Matlab
code is written to generate the equations of the blocks U1 and
U2 which are synthesized using VHDL.
Latency y 2 x2
(Clock cycles)
S0:
S1:
X1
Y1
Latency
y
(Clock cycles) 2
x2
S0: 1
Z3
S2:
S3:
S1:
S2:
S3:
X3
Z3
M
X3
S4:
A
B
C
D
E
F
Z1
Y3
Y3
Output Reg
Main
Ctrl
Double
Add
WQ
FSM
REG
FILE 1
REG
FILE 2
BLOCK I
GF(2163)
ROM
BLOCK II
GF(2163)
BLOCK III
GF(2163)
WQ
GF(2163)
kP
w-TNAF(U)
w-TNAF
Input Reg
W
X
Y
Z
Fig. 2. Elliptic curve cryptoprocessor by Koblitz Curves
The architecture of the cryptoprocessor over GF(2163) using

w-NAF algorithm for Koblitz curves is shown in Fig. 2, it is
designed using two register files; two digit-level finite
multipliers; one Frobenius map block; one RAM for storing
the expansion coefficients w-NAF of the scalar k; two ROMs
for storing the pre-computed points Pu in affine coordinates
which were obtained from Matlab for w = 2, 4, 8 and 16;
several squaring and adder blocks; and two FSMs, the first one
is the main control and the second one generates the control
signals for calculating the point addition, point doubling and
WQ.
The data bus width is 163-bit for the lines AF and WY, 8bit for the line Z when w = 2, 4 and 8, and 16-bit when w = 16.
Addition
Squaring
Multiplication
Fig. 4. Data dependence graph for point addition and point doubling
The main control generates the control signals to perform the

scalar multiplication, process the key, initialize the
cryptoprocessor and control the I/O registers. The processing
sequence of the main control is as follows: initialize the Q
coordinate according to the sign of the bit ui of w-NAF
expansion; evaluate the bits ui for i < t1, if ui z 0 then
compute the point addition in LD mixed coordinates and the
Frobenius map on Q, else compute Q; and return Q = kP in
LD projective coordinates when i = t1.
IV. HARDWARE VERIFICATION AND SYNTHESIS RESULTS
The cryptoprocessors are described using generic structural
VHDL, synthesized on the Stratix EP4SGX180HF35C2 using
Quartus II version 12, and verified using SignalTap II and
Matlab. The synthesis results of the cryptoprocessors over
GF(2163) are shown in Table I for w = 2, 4, 8 and 16. Also,
these results are presented in Fig. 5. From the above figure, it
is possible observe that the time to compute kP decreases and
the area is roughly the same when the window size is

increased. The area does not increase due to the fact that precomputed points Pu are stored in a ROM memory.
TABLE I
SYNTHESIS RESULTS FOR THE CRYPTOPROCESSORS:
Cryptoprocessors
Area
Fmax
Registers
kP
(s)
(ALUTs) (MHz)
Koblitz 2-NAF
24223
226.6
2046
9.88
Koblitz 4-NAF
24257
226.7
2050
7.37
Koblitz 8-NAF
24249
211.6
2108
6.17
24270
177.1
2135
5.05
Koblitz 16-WNAF
ROM
(kbits)
0.31
1.27
20.37
5216
computation time and area, obtaining a higher performance

than the other ones reported in the literature. Then, they are
very suitable for embedded hardware cryptosystems. The
designed cryptoprocessors use roughly the 17% of the ALUTs
of the FPGA, and the best one performs the scalar
multiplication in 5.05 Ps for w = 16. Future work includes
improvements in the design performance and adapting the
proposed hardware architecture design for other elliptic
curves, such as K-233, K-283, K-409 and K-571.
ACKNOWLEDGMENT
P. Realpe-Muoz thanks to Colciencias for the scholarship. V. TrujilloOlaya thanks to Colciencias for the scholarship and the Altera University
Program.
REFERENCES
[1]
(a)
(b)
[2]
[3]
[4]
(c)
(d)
Fig. 5. (a) Area resources. (b) Frequency. (c) Scalar Multiplication kP. (d)
Time-Area product for the cryptoprocessors.
[5]
[6]
In order to compare the performance of the designed

cryptoprocessors with respect to other ones presented in the
literature, in Table II are shown the synthesis results, time to
compute kP and time-area product. However, it is important to
mention that a fair comparison in hardware design is very
difficult because there are other technical considerations such
as technologies, hardware platform, software tools, scalar
multiplication algorithms, finite field representations, size of
the field, etc.
Design
[15]
[16]
[17]
[18]
2-NAF
4-NAF
8-NAF
16-WNAF
TABLE II
COMPARISON RESULTS:
FPGA
Area
Frequency
(ALUTs)
(MHz)
Stratix II
57762
152.2
Stratix II
44832
146.7
Stratix II
47160
162.4
Stratix II
37928
192.5
Stratix IV
24223
226.6
Stratix IV
24257
226.7
Stratix IV
24249
211.6
Stratix IV
24270
177.1
kP
(s)
13.38
28.92
9.48
9.85
9.88
7.37
6.17
5.05
Tu
uA
0.70
1.29
0.44
0.37
0.23
0.17
0.14
0.12
V. CONCLUSIONS AND FUTURE WORK

This work presents the design of elliptic curve
cryptoprocessors on Koblitz curves to calculate the scalar
multiplication over GF(2163) using GNB. The cryptoprocessors
are implemented using the window method with w = 2, 4, 8,
and 16, and they are described using generic structural VHDL
and synthesized on the Stratix EP4SGX180HF35C2.
The hardware implementation results show that the designed
cryptoprocessors present a very good trade-off between
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
J. Adkari, V.S. Dimitrov and R. J. Cintra, A new algorithm for double

scalar multiplication over Koblitz curves, in Proc. IEEE ISCAS, pp.
709-712, 2011.
B. Ansari and M. Anwar Hasan, High-performance architecture of
elliptic curve scalar multiplication, IEEE Trans. Comput., pp. 14431453, 2009.
G. Nabil, k. Naziha, F. Lamia and K. Lotfi, Hardware implementation
of elliptic curve digital signature (ECDSA) on Koblitz curves, IEEE
Int. Symposium on Communications Syst. Network and DSP, pp. 1-6,
2008.
K. U. Jrvinen and J.O. Skytt, High-speed elliptic curve cryptography
accelerator for Koblitz curves, Int. Symp. On Field-Programmable
Custom Comp. Machines, pp. 109-118, 2008.
J.A. Solinas, Efficient arithmetic on Koblitz curves, Des., Codes
Cryptograph., Vol. 19, No. 2, pp. 195-249, 2000.
ANSI X9.665-1999. The Elliptic Curve Digital Signature Algorithm.
Technical report, ANSI (1999).
National Institute of Standards and Technology, Digital Signature
Standard, FIPS Publication, 2000.
A. Reyhani-Masoleh, Efficient algorithms and architectures for field
multiplication using Gaussian Normal Basis, IEEE Trans. Comput.,
Vol. 55, No. 1, pp.34-47, 2006.
J. Lopez and R. Dahab, Fast multiplication on elliptic curves over
GF(2m) without precomputation, in Proc. Workshop Cryptograph
Hardware embedded Syst., pp. 316-327, 1999.
D. Hankerson, A. Menezes and S. Vanstone. Guide to Elliptic Curve
Cryptography, Book Springer-Verlag, 2003.
R. Azarderakhsh and A. Reyhani-Masoleh, A modified low complexity
digit-level Gaussian normal basis multiplier, in Proc. 3rd Int. Workshop
Arithmetic of Finite Fields, pp.25-40, 2010.
W.T. Huang, C.H. Chang and S.Y. Tan, Non-XOR approach for lowcost bit-parallel polynomial basis multiplier over GF(2m), IET Inf.
Security., pp. 152-162, 2011.
Z. Wang and F. Fan, Efficient Montgomery-based semi-systolic
multiplier for even-type GNB of GF(2m), IEEE Trans. Comput., pp.
415-419, 2012.
C.Y. Lee and C.W.Chiou, Scalable Gaussian normal basis multipliers
over GF(2m) using Hankel matrix-vector representation, Journal of
Signal Processing Systems., pp.197-211, 2012.
V. S. Dimitrov, K. U. Jrvinen, M. J. Jacobson, Jr., W. F. Chan and Z.
Huang, Provably sublinear point multiplication on Koblitz curves and
its hardware implementation, IEEE trans. Comput., pp. 1469-1481,
2008.
K. Jrvinen and J. Skitt, On parallelization of high-speed processors
for elliptic curve cryptography, IEEE Trans. Very Large Scale Integr.
(VLSI), pp. 1162-1175, 2008.
K. Jrvinen and J. Skitt, Fast point multiplication on Koblitz curves:
Parallelization method and implementations, Microprocess. Microsyst.,
pp. 106-116, 2009.
R. Azarderakhsh and A. Reyhani-Masoleh, High-Performance
Implementation of Point Multiplication on Koblitz Curves, IEEE
Trans. on Circuits and Systems., pp. 41-45, 2013.

06820253

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

06820253

Uploaded by

Copyright:

Available Formats

Latin American Symposium On Circuits And Systems

Design of Elliptic Curve Cryptoprocessors over

HE use of computer networks and the huge number of

978-1-4799-2507-0/14/$31.00 2014 IEEE

serial finite field multiplier, with word size w over GF(2m). In

due to Fermats Theorem; E E, then

Latin American Symposium On Circuits And Systems

B. Elliptic Curve Arithmetic

In the group operations, the inversion is the arithmetic

bZ14 Z 3  X 3 aZ3  Y12  bZ14

2) Point addition Q + P, where Q = (X1 : Y1 : Z1) in

kiW i for ki {0,1, 1}

Using the expression (11), the scalar multiplication can be

The result corresponds to the Hamming weight of NAF and

An efficient scalar multiplication algorithm that uses

algorithm requires, on average, m 1 Frobenius maps and

Latin American Symposium On Circuits And Systems

In this case, the Frobenius map is performed for coordinates of

Fig. 3. Functional blocks of the finite field arithmetic

According to the Fig. 4, the latencies for point addition and

Fig. 1. Digit-level Multiplier over GF(2m)

The main functional blocks of the multiplier are U, J, CS and

Fig. 2. Elliptic curve cryptoprocessor by Koblitz Curves

The architecture of the cryptoprocessor over GF(2163) using

The main control generates the control signals to perform the

Latin American Symposium On Circuits And Systems

computation time and area, obtaining a higher performance

In order to compare the performance of the designed

V. CONCLUSIONS AND FUTURE WORK

J. Adkari, V.S. Dimitrov and R. J. Cintra, A new algorithm for double

You might also like

bZ14 Z 3 X 3 aZ3 Y12 bZ14