About Chronos

Introduction

Chronos is a library of sparse linear algebra functions specifically designed for High Performance Computing. Chronos allows for the iterative solution of extreme size linear systems of equations and eigenproblems arising from real world industrial and scientific applications. Designed as an object-oriented software, it is able to run on several platforms, taking full advantage of many core processors and GPU accelerators currently available on modern HPC systems.

Chronos implements the most powerful algorithms for the solution of linear algebra problems. In particular, Chronos shines when the matrices are huge, up to hundreds millions of unknowns.

The numerical methods implemented in Chronos include several Krylov subspace methods as iterative schemes such as CG, GMRES, BiCGstab, SQMR. Moreover, the library implements best-in-class preconditioners to incredibly accelerate the convergence of the iterative scheme. Both single level (Approximate Inverses) and multilevel preconditioners (AMG) are available.

Mainly written in C++, Chronos uses openMP directives for shared memory processing and CUDA for the use of GPU accelerators. Interprocessor communication for distributed computation is accomplished through the MPI protocol.

Performance

The performance of Chronos have been evaluated with several benchmark matrices from real world problems, arising from various application fields.

The table below collects the main information concerning the benchmark matrices, such as number of rows (\(n\)), number of non-zeroes (\(nnz\)), average number of non-zeroes per row (\(avg. nnz/row\)) (more information available at the M3E matrix collection webpage). The benchmarks are subdivided into two classes denoted as Fluid (F) dynamic and Mechanical (M).

Matrix Class Application field \(n\) \(nnz\) avg. \(nnz\)/row
spe10 F Diffusion in heter. media 3,410,693 90,568,237 26.55
geo4m M Geomechanics 4,224,870 335,738,340 79.47
finger4m F Porous flow 4,718,592 23,591,424 5.00
guenda11m M Geomechanics 11,452,398 512,484,300 44.75
M10 M Mechanical 11,593,008 940,598,090 81.13
agg14m M Mesoscale 14,106,408 633,142,730 44.88
M20 M Mechanical 20,056,050 1,634,926,088 81.52
tripod24m M Mechanical 24,186,993 1,111,751,217 45.96
geo61m M Geomechanics 61,813,395 4,966,380,225 80.34
poi65m F CFD 65,939,264 460,595,552 6.99
Pflow73m F Reservoir 73,623,733 2,201,828,891 29.91
c4zz134m M Biomedicine 134,395,551 10,806,265,323 80.41
pois198m F Diffusion in omog. media 198,076,032 1,384,390,392 6.99

The table below shows the performance of the Chronos AMG (Algebraic Multigrid as preconditioner for the Conjugate Gradient) compared to the two state-of-the-art parallel AMG preconditioners BoomerAMG from Hypre package (mostly suited for fluid dynamic simulation) and GAMG from PETSc (mostly suited for mechanical simulation).

For each problem, the table provides the number of Marconi100 1 nodes allocated (selected to have about 3-4 millions unknown for each node), the total solution time \(T_t\) in seconds 2 and the operator complexity \(C_{op}\) 3.

The Chronos performance are evaluated on the Marconi100 supercomputer, from the Italian consortium for supercomputing (CINECA).

Matrix Class # of nodes # of cores Solver \(C_{op}\) \(T_t\) [s]
poi65m F 12 384 Chronos AMG 4.036 4.46
BoomerAMG 4.450 86.7
Pflow73m F 15 480 Chronos AMG 2.346 178.6
BoomerAMG 1.593 1108.2
guenda11m M 2 64 Chronos AMG 1.240 123.0
GAMG - 324.5
agg14m M 4 128 Chronos AMG 1.287 52.8
GAMG - 18.2
M20 M 4 128 Chronos AMG 1.292 149.1
GAMG - 602.4
tripod24m M 5 160 Chronos AMG 1.116 44.8
GAMG - 92.6

Performance

The strong scalability of Chronos is proved by solving poi65m and c4zz134m with an incresing number of cores. It can be noticed that the scalability is very close to the ideal speed-up (dashed line), with a slight departure from the ideal line mainly due to the reduced size of the lower levels in the AMG hierarchy.

Scalab_Poi65 Scalab_c4zz

License

Chronos is freely available for scientific use, only for the purpose of internal research, excluding any commercial use of Chronos as such or as a part of other software products.

Chronos is available with license agreement for commercial use and/or to be integrated into other commercial software products.

For the Chronos license agreement and further information, please send a request e-mail to: products@m3eweb.it

FootNote

\[\begin{equation} C_{op} = \frac{\sum_{i = 0}^{n_{lev}} \mbox{nnz}(A_i)}{\mbox{nnz}(A_0)} \end{equation}\]

  1. Marconi100 is composed by 980 nodes based on the IBM Power9 architecture, each equipped with 2 x 16-cores IBM POWER9 AC922 at 3.1 GHz processors and 4 x NVIDIA Volta V100 GPUs with Nvlink 2.0. The Internal Network is Mellanox Infiniband EDR DragonFly+. 

  2. The exit tolerance on the relative residual to achieve convergence is \(10^{-8}\)

  3. The operator complexity \(C_{op}\) (the memory footprint of the preconditioner) gives a measure of the preconditioner weight relative to the system matrix and is computed as