Professional Documents
Culture Documents
Outline
Basic solver descriptions
Direct and iterative methods
Why so many choices?
Performance issues
Usage rules of thumb
Usage examples
How do I chose the fastest solver??
Solver Basics: Ax = b
Direct Methods
Factor: A = LDLT
=
Solve:
Lz=b
z = D-1z
LT x = z
=
=
=
Compute matrix L
Solve triangular systems
Solver Basics: Ax = b
Direct Methods
Stationary Methods
(Guess and Go)
Factor: A = LDLT
=
Solve:
Lz=b
z = D-1z
LT x = z
Iterative Methods
=
=
=
Projection Methods
(project and minimize)
Choose x0
Iterate:
x K+1 = Gxk + c
Until
|| x k+1 xk || < e
Iterate:
Compute Apk;
Update xk = xk-1 + k pk-1
rk = rk-1 k Apk
pk= rk + k pk-1
Until || rk || <
Compute matrix L
Vector updates
Factor is expensive
Memory & lots of flops
huge file to store L
Iterative Methods
Sparse Ax multiply
cheap but slow
Memory bandwidth
and cache limited
Harder to parallelize
Preconditioners are
not always robust
Convergence is not
guaranteed
Multi-Point Constraints
Direct elimination method
x1= GTx2 + g
A11
A12
x1
b1
=
AT12
A22
x2
b2
solve :
(GA11GT + GA12 + AT12 GT+ A22) x2 = b2 + Gb1 -AT12g - GA11g
Solver Usage
Sparse, PCG and ICCG solvers
cover 95% of all ANSYS
applications
Sparse solver is now default in
most cases for robustness and
efficiency reasons
Usage Guidelines:Sparse
Resource requirements
Total factorization time depends on model geometry
and element type
Shell models best
Bulky 3-D models with higher order elements
more expensive
System requirements
1 Gbyte per million dofs
10 Gbyte disk per million dofs
Eventually runs out of resource
10 million dofs = 100 Gbyte file
100 Gbytes X 3 = 300 Gbytes I/O
300 Gbytes @ 30 Mbytes/sec = approx. 10,000
seconds I/O wait time
Usage Guidelines:PCG
Resource requirements
1 Gbyte per million dofs
Memory grows automatically for large
problems
I/O requirement is minimal
Convergence is best for meshes with good
aspect ratios
3-D cube elements converge better than thin
shells or high aspect solids
Over 500k dofs shows best performance
compared to sparse
Performance Summary
Where to look
PCG solver; file.PCS
Sparse solver; output file
Add Bcsopt ,,, ,,, -5 (undocu. Option)
What to look for
Degrees of freedom
Memory usage
Total iterations (iterative only)
Usage Guidelines
Tuning sparse solver performance
Bcsopt command (undocumented)
Optimal I/O for largest jobs
In-core for large memory systems and
small to medium jobs ( < 250,000 dofs )
Use parallel processing
-5
forc
limit
nnnn - Mbytes
up to 2048
Force or limit
solver memory
space in Mbytes
Print
performance
stats
Beam
Car
Joint
Carrier2
Carrier1
RailCar
Engine
Assembly
110838
421677
502851
502851
980484
1470915
1676660
3388179
Car
Joint
Carrier2
Carrier1
RailCar
Engine
Assembly
421677
502851
502851
980484
1470915
1676660
3388179
4215
1014
763
1147
7488
13770
x
Peak Memory
5.7
6
58
1124
480
1115
1665
1084
x
x
124
940
312
1115
1196
1084
1466
2873
268
294
349
677
862
1235
x
269
294
349
677
862
1235
x
Benchmark study:ModalAnalysis
DOF
Beam
Car
Joint
Carrier2
110838
421677
502851
502851
Peak Memory
5.7
6
58
1124
480
1115
124
940
312
1115
9.79 Mbytes
1262.59 Mbytes
588214172 :
560.96 Mbytes
256482560 :
244.60 Mbytes
63.53 cpu
69.02 wall
576.93 Mbytes
275
537.670
542.897
=
=
=
=
=
=
=
=
2.1M
84M
2.4B
7.4
2090946
84553633
2434337580
9645
46517835
7.4048E+12
193.810000
456.970000
7.400000
287.240000
12671.900000
584.351978
765.760000
12.729582
223.923436
461.192527
7.471412
384.408332
13367.557193
553.941885
1411.694416
6.905039
unit number
-----------
length
------
amount
------
20.
25.
9.
11.
138591505.
9310592.
2434337580.
169107266.
288939194.
32587072.
7894888171.
507331541.
Dofs
Nzeros in K (40/1)
Nzeros in L (1142/29)
Trillion F.P. ops
1,698,525 Equations
20,299 Multi-Point CEs
dim
coefs
mxcolmlth
******* ********* *********
1698525 55419678
209
20299
58147
6
20299
860830
99
1698525
865275
381
1698525 58304862
404
=
547.22 MB
=
9417.10 MB
out-of-core = 527.29 MB
out-of-core = 127.25 MB
unit number
----------20
25
9
11
=
=
=
=
=
=
=
=
162.05
340.37
7.93
225.89
4812.04
592.55
365.90
12.54
length
------
91464700
5319910
1145424928
108680690
172.82
342.63
7.96
267.12
5086.30
560.60
663.77
6.91
amount
-----192879558
18619685
3816657199
615455923
70.33 Mbytes
583.75 Mbytes
70.33 Mbytes
583.75 Mbytes
=
488.21 MB
=
7348.80 MB
out-of-core = 651.66 MB
out-of-core =
63.18 MB
=
=
=
=
=
=
=
=
=
40.130000
269.940000
6.780000
127.230000
5312.130000
740.325416
0.0000D+00
117.400000
28.598027
unit number
-----------
length
------
amount
------
20.
25.
9.
11.
32.
77728422.
3414388.
838554505.
84582582.
77134410.
173599480.
11950358.
2753998119.
723248228.
11000063203.
Elapsed time 5X
larger than CPU time!
62.959089
296.114490
8.303449
624.087842
25199.576871
156.062337
890.047902
3.772166
5956.170
27177.617
=
=
=
=
=
=
=
=
=
40.170000
270.470000
6.770000
116.300000
4773.720000
823.823853
0.0000D+00
115.450000
29.081060
unit number
-----------
length
------
amount
------
20.
25.
9.
11.
77728422.
3414388.
838554505.
80694876.
173599480.
11950358.
2674553251.
470460286.
62.870488
294.560051
8.156075
360.649415
3418.024528
1150.578169
1 Gflop sustained
880.488530
3.813120
5405.520
5122.035
Solid45 Elements
Solid 95 Elements
Solid 92 Elements
Surf154 Elements
Constraint Equations
274.72 Mbytes
536.85 Mbytes
274.72 Mbytes
536.85 Mbytes
4971036 :
1502114112 :
804449536 :
185689952 :
79.95 cpu
51678
4.74 Mbytes
1432.53 Mbytes
767.18 Mbytes
177.09 Mbytes
80.48 wall
55724809.
3780444.
774037991.
25.484000
108.000000
4.531000
217.094000
2224.359000
1335.468214
0.0000D+00
824.610000
3.759268
amount
-----118615367.
13231554.
2451597713.
3432.969
3455.119
829.907434
3.735272
Version 6.1
Automatic in most cases
Tuning possible using bcsopt
words
words
words
words
words
words
words
0.00
1690.57
155.14
1543.50
89.94
1543.50
0.00
Mbytes
Mbytes
Mbytes
Mbytes
Mbytes
Mbytes
Mbytes
1690.57 Mbytes
221585885
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
2149066
78007632
2698281519.
20062
201251953
1.9072E+13
1.0804E+10
181.640000
800.110000
10.520000
259.820000
30088.200000
633.883809
768.060000
14.066442
unit number
-----------
length
------
amount
------
20.
25.
9.
11.
129721091.
7964778.
2698281519.
250835024.
272147055.
27876723.
8406875085.
1964282370.
2.1M
78M
2.7B
19
Dofs
Nzeros in K (37/1)
Nzeros in L (1286/35)
Trillion F.P. ops
191.469832
807.598554
10.572956
345.327975
31027.369107
614.696745
1634.010750
6.611873
File.PCS output
Good convergence
(1000 or more is bad)
Default
Degrees of Freedom: 228030
DOF Constraints: 3832
Elements: 16646
Assembled: 16646
Implicit: 0
Nonzeros in Upper Triangular part of
Global Stiffness Matrix : 18243210
Nonzeros in Preconditioner: 4412553
Total Operation Count: 4.60199e+10
Total Iterations In PCG: 488
****************************************************
Total PCG Solver CP Time: User: 850.2 secs:
****************************************************
Estimate of Memory Usage In CG : 208.261 MB
Estimate of Disk Usage
: 215.985 MB
****************************************************
Multiply with A MFLOP Rate:62.3031 MFlops
Solve With Precond MFLOP Rate:53.5653 MFlops
*****************************************************
AMG Performance
Solver time (sec) vs number of processors
Time (Sec)
728.9
438.7
241.6
157
Number of Processors
10
AMG vs PowerSolver
Advantages:
Insensitive to matrix ill-conditioning.
Performance doesnt deteriorate for high
aspect ratio elements, rigid links, etc
5x faster than the PowerSolver for difficult
problems on a single processor
Scalable up to 8 processors (shared- memory
only), 5 times faster with 8 processors
AMG vs PowerSolver
Disadvantages:
30% more memory required than PowerSolver
20% slower than PowerSolver for well
conditioned problems on a single processor
Doesnt work for Distributed-Memory architecture
(neither does PowerSolver).
Scalability is limited by memory bandwidth (so is
PowerSolver)
1758.000
= 32283.010
But,
Your results may vary
ANSYS DesignSpace
Detailed Solid Model
PCG
AMG
SPAR
Memory
Iter
Mbytes
10-6
300
722
290
5301
200
NP=2
NP=3
NP=4
6909
1884
-dofs 50
Extrude 2-D mesh to
obtain 50,000 Dofs
Elements sized to
maintain nice aspect
ratios
Static Analysis
HP L-Class
Four 550 Mhz CPUs
4 Gbytes Memory
Time (sec)
4000
3000
2000
1000
0
134k
245k
489k
Degrees of Freedom
Solver
(aspect)
Static Analysis
HP L-Class
Four 550 Mhz CPUs
4 Gbytes Memory
Time (sec)
4000
3000
2000
1000
0
134k
245k
489k
Degrees of Freedom
Solver
(aspect)
Static Analysis
HP L-Class
Four 550 Mhz CPUs
4 Gbytes Memory
Time (sec)
2000
spar
amg
amg
amg
pcg
pcg
pcg
1500
1000
500
0
1 CPU
2 Cpus
3 Cpus
Summary
ANSYS has industry leading solver
technology to support robust and
comprehensive simulation capability
Attention to solver capabilities and
performance characteristics will extend
analysis capabilities
Future improvements will include
increasing parallel processing capabilities
and new breakthrough solver technologies