About Chronos
Introduction
Chronos is a library of sparse linear algebra functions specifically designed for High Performance Computing. Chronos allows for the iterative solution of extreme size linear systems of equations and eigenproblems arising from real world industrial and scientific applications. Designed as an objectoriented software, it is able to run on several platforms, taking full advantage of many core processors and GPU accelerators currently available on modern HPC systems.
Chronos implements the most powerful algorithms for the solution of linear algebra problems. In particular, Chronos shines when the matrices are huge, up to hundreds millions of unknowns.
The numerical methods implemented in Chronos include several Krylov subspace methods as iterative schemes such as CG, GMRES, BiCGstab, SQMR. Moreover, the library implements bestinclass preconditioners to incredibly accelerate the convergence of the iterative scheme. Both single level (Approximate Inverses) and multilevel preconditioners (AMG) are available.
Mainly written in C++, Chronos uses openMP directives for shared memory processing and CUDA for the use of GPU accelerators. Interprocessor communication for distributed computation is accomplished through the MPI protocol.
Performance
The performance of Chronos have been evaluated with several benchmark matrices from real world problems, arising from various application fields.
The table below collects the main information concerning the benchmark matrices, such as number of rows (\(n\)), number of nonzeroes (\(nnz\)), average number of nonzeroes per row (\(avg. nnz/row\)) (more information available at the M3E matrix collection webpage). The benchmarks are subdivided into two classes denoted as Fluid (F) dynamic and Mechanical (M).
Matrix  Class  Application field  \(n\)  \(nnz\)  avg. \(nnz\)/row 

spe10 
F  Diffusion in heter. media  3,410,693  90,568,237  26.55 
geo4m 
M  Geomechanics  4,224,870  335,738,340  79.47 
finger4m 
F  Porous flow  4,718,592  23,591,424  5.00 
guenda11m 
M  Geomechanics  11,452,398  512,484,300  44.75 
M10 
M  Mechanical  11,593,008  940,598,090  81.13 
agg14m 
M  Mesoscale  14,106,408  633,142,730  44.88 
M20 
M  Mechanical  20,056,050  1,634,926,088  81.52 
tripod24m 
M  Mechanical  24,186,993  1,111,751,217  45.96 
geo61m 
M  Geomechanics  61,813,395  4,966,380,225  80.34 
poi65m 
F  CFD  65,939,264  460,595,552  6.99 
Pflow73m 
F  Reservoir  73,623,733  2,201,828,891  29.91 
c4zz134m 
M  Biomedicine  134,395,551  10,806,265,323  80.41 
pois198m 
F  Diffusion in omog. media  198,076,032  1,384,390,392  6.99 
The table below shows the performance of the Chronos AMG (Algebraic Multigrid as preconditioner for the Conjugate Gradient) compared to the two stateoftheart parallel AMG preconditioners BoomerAMG from Hypre package (mostly suited for fluid dynamic simulation) and GAMG from PETSc (mostly suited for mechanical simulation).
For each problem, the table provides the number of Marconi100 ^{1} nodes allocated (selected to have about 34 millions unknown for each node), the total solution time \(T_t\) in seconds ^{2} and the operator complexity \(C_{op}\) ^{3}.
The Chronos performance are evaluated on the Marconi100 supercomputer, from the Italian consortium for supercomputing (CINECA).
Matrix  Class  # of nodes  # of cores  Solver  \(C_{op}\)  \(T_t\) [s] 

poi65m 
F  12  384  Chronos AMG  4.036  4.46 
BoomerAMG  4.450  86.7  
Pflow73m 
F  15  480  Chronos AMG  2.346  178.6 
BoomerAMG  1.593  1108.2  
guenda11m 
M  2  64  Chronos AMG  1.240  123.0 
GAMG    324.5  
agg14m 
M  4  128  Chronos AMG  1.287  52.8 
GAMG    18.2  
M20 
M  4  128  Chronos AMG  1.292  149.1 
GAMG    602.4  
tripod24m 
M  5  160  Chronos AMG  1.116  44.8 
GAMG    92.6 
The strong scalability of Chronos is proved by solving poi65m
and c4zz134m
with
an incresing number of cores. It can be noticed that the scalability is very
close to the ideal speedup (dashed line), with a slight departure from the ideal line
mainly due to the reduced size of the lower levels in the AMG hierarchy.
License
Chronos is freely available for scientific use, only for the purpose of internal research, excluding any commercial use of Chronos as such or as a part of other software products.
Chronos is available with license agreement for commercial use and/or to be integrated into other commercial software products.
For the Chronos license agreement and further information, please send a request email to: products@m3eweb.it
FootNote

Marconi100 is composed by 980 nodes based on the IBM Power9 architecture, each equipped with 2 x 16cores IBM POWER9 AC922 at 3.1 GHz processors and 4 x NVIDIA Volta V100 GPUs with Nvlink 2.0. The Internal Network is Mellanox Infiniband EDR DragonFly+. ↩

The exit tolerance on the relative residual to achieve convergence is \(10^{8}\). ↩

The operator complexity \(C_{op}\) (the memory footprint of the preconditioner) gives a measure of the preconditioner weight relative to the system matrix and is computed as ↩