Solver options and performance

PETGEM assembles a complex-symmetric linear system per frequency and solves it with PETSc. This page describes the default solver configurations, how to choose a solver as the polynomial order increases, and the mesh/parallelism considerations that drive performance.

Default solver configurations

The parameter file emitted by utils/preprocess.py carries a working solver configuration for each mode.

Forward modeling

The forward kernel defaults to a preconditioned iterative solver - FGMRES with the BDDC preconditioner (deluxe scaling, LU coarse solve):

-dm_mat_type is
-ksp_type fgmres
-pc_type bddc
-pc_bddc_use_deluxe_scaling 1
-pc_bddc_coarse_pc_type lu

This converges in a handful of iterations at low to moderate order and scales well across MPI ranks.

Inverse modeling

The inverse kernel re-solves the system at every L-BFGS iteration (forward and adjoint, for every frequency), so it defaults to a direct factorization with MUMPS:

-ksp_type preonly
-pc_type                    lu
-pc_factor_mat_solver_type  mumps
-mat_mumps_icntl_14         80
-mat_mumps_icntl_28         1

Choosing a solver by polynomial order

The BDDC preconditioner is built from a topological (order-1) discrete gradient hint. As the polynomial order rises, the high-order curl-kernel is increasingly invisible to that order-1 coarse space, and BDDC convergence degrades:

nord

Recommended forward solver

Notes

1-5

BDDC (FGMRES)

Converges in a handful of iterations

6

LU + MUMPS

BDDC degrades sharply; use a direct solver

At nord=6, switch the forward run to a direct solver by overriding the solver keys on the command line (or editing the parameter file):

mpirun -n 4 build/fm.csem -options_file params_p6.txt \
   -ksp_type preonly -pc_type lu -pc_factor_mat_solver_type mumps

Adaptive BDDC coarse-space enrichment is not applicable here: the CSEM operator is complex-symmetric and indefinite, which violates the positive-definiteness assumption of the adaptive eigenvalue selection.

Mesh resolution and the h-p tradeoff

Accuracy scales with both mesh refinement (h) and polynomial order (p). Higher order delivers more accuracy per element, so the mesh can be coarsened as the order increases - the canonical case ships a separate mesh_p${NORD}.geo for each order, growing coarser with nord.

The binding constraint for accuracy is resolving the conductivity contrast (e.g. a resistive block), controlled by the local element size in that region. Refining the source line or the far field has little effect on the misfit once the contrast is resolved. See Mesh generation for the meshing parameters.

Parallelism

The kernels are MPI-parallel and the numerical result is invariant to the rank count (within solver tolerance). For best wall-clock performance, match the number of MPI ranks to the available physical cores - oversubscribing (more ranks than cores) makes MPI ranks contend and inflates run time. A typical parallel invocation is:

mpirun -n <ranks> build/fm.csem -options_file params.txt