About Chronos
Introduction
Chronos is a library of sparse linear algebra functions specifically designed for High Performance Computing. Chronos allows for the iterative solution of extreme size linear systems of equations and eigenproblems arising from real world industrial and scientific applications. Designed as an object-oriented software, it is able to run on several platforms, taking full advantage of many core processors and GPU accelerators currently available on modern HPC systems.
Chronos implements the most powerful algorithms for the solution of linear algebra problems. In particular, Chronos shines when the matrices are huge, up to hundreds millions of unknowns.
The numerical methods implemented in Chronos include several Krylov subspace methods as iterative schemes such as CG, GMRES, BiCGstab, SQMR. Moreover, the library implements best-in-class preconditioners to incredibly accelerate the convergence of the iterative scheme. Both single level (Approximate Inverses) and multilevel preconditioners (AMG) are available.
Mainly written in C++, Chronos uses openMP directives for shared memory processing and CUDA for the use of GPU accelerators. Interprocessor communication for distributed computation is accomplished through the MPI protocol.
Performance
The performance of Chronos have been evaluated with several benchmark matrices from real world problems, arising from various application fields.
The table below collects the main information concerning the benchmark matrices, such as number of rows (\(n\)), number of non-zeroes (\(nnz\)), average number of non-zeroes per row (\(avg. nnz/row\)) (more information available at the M3E matrix collection webpage). The benchmarks are subdivided into two classes denoted as Fluid (F) dynamic and Mechanical (M).
Matrix | Class | Application field | \(n\) | \(nnz\) | avg. \(nnz\)/row |
---|---|---|---|---|---|
spe10 |
F | Diffusion in heter. media | 3,410,693 | 90,568,237 | 26.55 |
geo4m |
M | Geomechanics | 4,224,870 | 335,738,340 | 79.47 |
finger4m |
F | Porous flow | 4,718,592 | 23,591,424 | 5.00 |
guenda11m |
M | Geomechanics | 11,452,398 | 512,484,300 | 44.75 |
M10 |
M | Mechanical | 11,593,008 | 940,598,090 | 81.13 |
agg14m |
M | Mesoscale | 14,106,408 | 633,142,730 | 44.88 |
M20 |
M | Mechanical | 20,056,050 | 1,634,926,088 | 81.52 |
tripod24m |
M | Mechanical | 24,186,993 | 1,111,751,217 | 45.96 |
geo61m |
M | Geomechanics | 61,813,395 | 4,966,380,225 | 80.34 |
poi65m |
F | CFD | 65,939,264 | 460,595,552 | 6.99 |
Pflow73m |
F | Reservoir | 73,623,733 | 2,201,828,891 | 29.91 |
c4zz134m |
M | Biomedicine | 134,395,551 | 10,806,265,323 | 80.41 |
pois198m |
F | Diffusion in omog. media | 198,076,032 | 1,384,390,392 | 6.99 |
The table below shows the performance of the Chronos AMG (Algebraic Multigrid as preconditioner for the Conjugate Gradient) compared to the two state-of-the-art parallel AMG preconditioners BoomerAMG from Hypre package (mostly suited for fluid dynamic simulation) and GAMG from PETSc (mostly suited for mechanical simulation).
For each problem, the table provides the number of Marconi100 1 nodes allocated (selected to have about 3-4 millions unknown for each node), the total solution time \(T_t\) in seconds 2 and the operator complexity \(C_{op}\) 3.
The Chronos performance are evaluated on the Marconi100 supercomputer, from the Italian consortium for supercomputing (CINECA).
Matrix | Class | # of nodes | # of cores | Solver | \(C_{op}\) | \(T_t\) [s] |
---|---|---|---|---|---|---|
poi65m |
F | 12 | 384 | Chronos AMG | 4.036 | 4.46 |
BoomerAMG | 4.450 | 86.7 | ||||
Pflow73m |
F | 15 | 480 | Chronos AMG | 2.346 | 178.6 |
BoomerAMG | 1.593 | 1108.2 | ||||
guenda11m |
M | 2 | 64 | Chronos AMG | 1.240 | 123.0 |
GAMG | - | 324.5 | ||||
agg14m |
M | 4 | 128 | Chronos AMG | 1.287 | 52.8 |
GAMG | - | 18.2 | ||||
M20 |
M | 4 | 128 | Chronos AMG | 1.292 | 149.1 |
GAMG | - | 602.4 | ||||
tripod24m |
M | 5 | 160 | Chronos AMG | 1.116 | 44.8 |
GAMG | - | 92.6 |
The strong scalability of Chronos is proved by solving poi65m
and c4zz134m
with
an incresing number of cores. It can be noticed that the scalability is very
close to the ideal speed-up (dashed line), with a slight departure from the ideal line
mainly due to the reduced size of the lower levels in the AMG hierarchy.
License
Chronos is freely available for scientific use, only for the purpose of internal research, excluding any commercial use of Chronos as such or as a part of other software products.
Chronos is available with license agreement for commercial use and/or to be integrated into other commercial software products.
For the Chronos license agreement and further information, please send a request e-mail to: products@m3eweb.it
FootNote
-
Marconi100 is composed by 980 nodes based on the IBM Power9 architecture, each equipped with 2 x 16-cores IBM POWER9 AC922 at 3.1 GHz processors and 4 x NVIDIA Volta V100 GPUs with Nvlink 2.0. The Internal Network is Mellanox Infiniband EDR DragonFly+. ↩
-
The exit tolerance on the relative residual to achieve convergence is \(10^{-8}\). ↩
-
The operator complexity \(C_{op}\) (the memory footprint of the preconditioner) gives a measure of the preconditioner weight relative to the system matrix and is computed as ↩