Professional Documents
Culture Documents
1. Introduction
The SVD is one of the most important matrix
factorisations in linear algebra. Its applications vary from
beamfonning and source localization, to spect"
analysis, digital image processing, Principal Component
Analysis (PCA), Latent Semantic Analysis (LSA)... etc
~31.
matrix:
T
J: 44=4+,
(3)
-35-
A, is 'more
(4)
D=A, ; V = n J , ! ; U = n J I
(5)
J b , q , B ) ofthe form:
r
(Where p < q ,
["""q
-sin4
sin61
C
I,[*.
si4
-sin@, cog,
am],[""
ow am
j=kdmO]
(6)
Begin
Determine 0,,8,(or cudsin);
A :=
end.
Jb,
A . Jb,
4.0.)
-36-
Solve 2 x 2 SVD
Output rotation parameters
Apply rotations to sub-matrix
Output data ( 2 x 2 sub-matnx)
Wait for new data
Else {off-diagonal processor)
Wait for new rotation parameters
Output rotation parameters
Apply rotations to submatrix
Output data ( 2 x 2 submatrix)
Wait for new data
to
two time steps while waiting for the processors Pj+,,,il
complete their (possibly delayed) steps. Thus the price
paid to avoid broadcasting is that each processor is active
for only one third of the total computation (figure 3) [2].
In the next section we show a new method of
synchronizing processors operations to increase the
efficiency of t h ~ systolic
s
array.
Algorithm processor
If T 2 A and T-A~O(mod3)then
Begin
If T # A then read new 2 x 2 matrix
If A = 0 then (diagonal processor)
Solve 2 x 2 SVD
how the
computations are staggered to avoid broadcasting.
The value inside each box indicates the iteration
number.
BLV amy
COmpUfatiOn
timofone
stsp
Improvsm
Ropossd
T=ma%2p,.d
T = T2,2,,*
+Td,T,)
"(TrdzTm)
cnt
~00000000000
aa000000000i3
dficimcy
E " 0
h&00
0BU0000~0000
00B000 '000000
000000'000000
~putntjon
tuneofthe
&canposition
-38
1/3
About the
same
3 times
bsm
(N/2-l)+3)T
T ~ d ~ N l 2 - 1 3timss
faster
The resulting x,
UT =Jb,q,QJUT
(8)
The application of the rotations to the housed submatrices is described below:
E;
t ] = [ m-sine,
"
sci a4a
7.kt1.-"p
4. P
. 4. P. cad,
[ry8, ]={yw a, ].[-sin@,
si&,
coss,
si4
(11)
-sinB, sCO@
i4
implementation
of
plan-
f i f+ l
i=O
.(/b-'U')= 1+ &
2-'(0)h(1+
(10)
codj
k ;]=p 1.k];
6. Hardware
rotations
(9)
A, =
(12)
j=1
I,.
can be decomposed
[ai2
CORDIC iteration
For i=O,..,m-l
and y,
Begin
Pi -41
-P2 qz
A = A ~ + A ~ = [ 4 Pll+[12
i
P21
The matrix A is then rotated into B as follows:
(13)
End
d i represents the rotation direction and is chosen
according to the CORDIC mode: In the 'rotation mode',
the input vector is rotated by an angle zo = B and
b/x)
-39 -
M O " .
e,
Solve a 2 SVD
somwfafion and
Word seral
"t
output
ratatlo"
Of
malm v
rcseLfiDn and
mtstion 01 matrix v
-40
.-.
IIY(
processor, and A ,
T = 23W +Tsc+T, + 11 1
wog
la
Latency
(wW
<I6
16-22
23-26
27
28-30
31-33
10
11
12
I3
I
.
_
Pop =
+c+ 108
+ T, + I
11
-41 -
4 . . . ..-...!
Figure 12. Area vs word length
9. References
[I] R.P. Brent, and F.T. Luk, The solution of singularvalue and symmetric eigenvalue problems on
multiprocessor arrays SIAM J. SCI. STAT. COMPLT,
vol.6, no. 1, pp. 69-84, January. 1985.
[2] R.P. Brent, F.T. Luk, and C. Van Loan
Computation of singular value decomposition using
mesh-connected processors J. =SI. Comput Syst, vol.
1, no. 3,pp. 242-270, 1985.
(31 B. Yang, and J.F. Bobme, Reducing the
computations of the singular value decomposition m a y
given by Brent and Luk SIAM J. Matrix. Anal. Appl,
vol. 12, no. 4,pp. 713-725, October. 1991.
[4] 1.R Cavallaro, F.T. Luk, CORDIC arithmetic for
an SVD processor Joumal of parallel and distributed
computing, vol. 5, pp. 271-290, 1988.
[5] Ray Andraka, A s w e y of CORDIC algorithms for
FFGA based computers in ACMISIGDA sixth
international symposium on field programmable gate
arrays, Motery, California, United states, 1998, pp. 191200.
[6] J.Volder. The CORDIC Computing Techruque, IRE
Trans. Comput., Sept. 1959,pp.330-334.
[7] J.S. Walther. A Unified Algorithm for elementary
functions. Roc. M I P S Spring Joint Computer
Conference, pp.379-385,1971.
[8] G.H. Golub, and C.F. Van Loan, Matrix
computations, The Johns Hopkins University Press,
London, 1996.
[9] www.Celoxica.com.
[lo] Celoxica application note, tbe technology behind
DKI,AN 18V1.0, 2001.
_.^
-42