Steady-state solvers
Finding a steady state of
The outer nonlinear solver can be any algorithm from NonlinearSolve.jl (
NewtonRaphson,TrustRegion,LimitedMemoryBroyden, …), or AIBECS's ownCTKAlg— a bespoke Newton–Chord–Shamanskii method adapted from C. T. Kelley's MATLAB implementations (Kelley, 2003) and specialised to the AIBECS state Jacobian.The inner linear solver can be any factorisation from LinearSolve.jl — usually a sparse direct factorisation (
UMFPACKFactorization,KLUFactorization,MKLPardisoFactorize).
The NonlinearSolve how-to walks through the call pattern for the SciML side. For CTKAlg the call is just solve(prob, CTKAlg()).
Recommended pairing: CTKAlg() + UMFPACKFactorization()
Calling CTKAlg() with no arguments yields CTKAlg(linsolve = UMFPACKFactorization()) — the AIBECS-recommended default. Two reasons it's the recommendation:
CTKAlg uses a Shamanskii update. The Newton iteration only refreshes the Jacobian when the residual stops shrinking fast enough (
r > rSham₀ = 0.5). On AIBECS systems where the steady-state Jacobian changes slowly between iterations, one factorisation typically covers several Newton steps — generally faster in our experience than the "refresh every iteration" baseline thatNewtonRaphsonuses. The tradeoff is robustness: when the Jacobian does drift, Shamanskii's stale factors can take an extra Newton step to recover. Armijo line search is the safety net.UMFPACKFactorizationreuses its factors.CTKAlgruns the linear solve through aLinearSolve.LinearCache, so when Shamanskii recycles the Jacobian, UMFPACK's LU factors are reused directly — no refactorisation. Pairing CTKAlg with a factorisation that supports cache reuse is what makes the Shamanskii win compound.
If CTKAlg fails to converge on a pathological problem, the standard fallback is NewtonRaphson (still with UMFPACK) via the NonlinearSolve route below — fresh Jacobian every iteration, slower but more forgiving. More robust schemes still — for example pseudo-transient continuation (NonlinearSolve.PseudoTransient), which embeds the steady-state problem in a fictitious time evolution and is known to help on stiff systems — may also be worth trying; we have not benchmarked them on AIBECS systems.
When to use the NonlinearSolve route instead
You want a different outer algorithm (
TrustRegion,LimitedMemoryBroyden, …).You want to benchmark several algorithms against
CTKAlgon your own circulation.You need NonlinearSolve features that
CTKAlgdoes not surface (custom termination modes, callbacks, sensitivity hooks).
The cost is the load-time of NonlinearSolve.jl and LinearSolve.jl plus their transitive deps — neither is a hard AIBECS dependency. The underlying SteadyStateProblem and sparse Jacobian are reused across both routes; switching is a one-line change at the call site.
Verified algorithm/factorisation pairs
The combinations below have been checked on the toy Primeau_2x2x2 circulation and converge to the same root as CTKAlg().
| Solver call | Match vs CTKAlg() | Notes |
|---|---|---|
solve(prob, CTKAlg()) | (baseline) | Native AIBECS Newton–Chord–Shamanskii; CTKAlg + UMFPACK default. |
solve(nlprob, NewtonRaphson(linsolve = UMFPACKFactorization())) | exact | Recommended NonlinearSolve fallback. |
solve(nlprob, NewtonRaphson(linsolve = KLUFactorization())) | 1e-10 | KLU may win on very-sparse Jacobians. |
solve(nlprob, AIBECS.recommended_nlalg()) | exact | Convenience for NewtonRaphson + UMFPACK. |
For the NonlinearSolve rows, nlprob = AIBECS.nonlinearproblem(prob) — see the how-to for the full pattern. The helper attaches the sparse Jacobian buffer type so LinearSolve's UMFPACK / KLU factorisations allocate sparse buffers instead of erroring on a dense default.
Latest benchmark results
Auto-generated from benchmark/solvers.jl; the timings below are wall clock for one steady-state solve on each circulation/tracer pair, on the maintainer's laptop. The small-tier suite covers OCCA and OCIM0 — enough for relative comparisons across the available algorithm/factorisation pairs without needing a workstation. (Primeau_2x2x2 is used elsewhere as a smoke-test circulation but is too small to time meaningfully, so it's not in the benchmark tier.)
OCCA / idealage (n = 84661)
| Algorithm | Time (s) | Max rel drift | τ (yr) | Converged |
|---|---|---|---|---|
| CTKAlg + UMFPACK | 1 | 4.07e-11 | age=8.3e+14 | ✓ |
| CTKAlg + KLU | 5 | 6.82e-11 | age=5.3e+14 | ✓ |
| NewtonRaphson + UMFPACK | 1 | 4.07e-11 | age=8.3e+14 | ✓ |
| NewtonRaphson + KLU | 4 | 6.82e-11 | age=5.3e+14 | ✓ |
| NLsolveJL + Newton/UMFPACK | 1 | 4.07e-11 | age=8.3e+14 | ✓ |
| NewtonRaphson + MKLPardisoFactorize | 0.8 | 7.28e-12 | age=2.6e+15 | ✓ |
| NewtonRaphson + MKLPardisoIterate | 0.77 | 7.28e-12 | age=2.6e+15 | ✓ |
| CTKAlg + MKLPardisoFactorize | 1 | 7.28e-12 | age=2.6e+15 | ✓ |
| CTKAlg + MKLPardisoIterate | 0.75 | 7.28e-12 | age=2.6e+15 | ✓ |
OCCA / radiocarbon (n = 84661)
| Algorithm | Time (s) | Max rel drift | τ (yr) | Converged |
|---|---|---|---|---|
| CTKAlg + UMFPACK | 0.77 | 4.10e-08 | R=2.2e+13 | ✓ |
| CTKAlg + KLU | 5 | 7.36e-08 | R=1.7e+13 | ✓ |
| NewtonRaphson + UMFPACK | 1 | 4.10e-08 | R=2.2e+13 | ✓ |
| NewtonRaphson + KLU | 5 | 7.36e-08 | R=1.7e+13 | ✓ |
| NLsolveJL + Newton/UMFPACK | 0.74 | 4.10e-08 | R=2.2e+13 | ✓ |
| NewtonRaphson + MKLPardisoFactorize | 0.78 | 2.24e-08 | R=3.9e+13 | ✓ |
| NewtonRaphson + MKLPardisoIterate | 0.83 | 2.24e-08 | R=3.9e+13 | ✓ |
| CTKAlg + MKLPardisoFactorize | 0.77 | 2.24e-08 | R=3.9e+13 | ✓ |
| CTKAlg + MKLPardisoIterate | 0.77 | 2.24e-08 | R=3.9e+13 | ✓ |
OCCA / po4pop (n = 169322)
| Algorithm | Time (s) | Max rel drift | τ (yr) | Converged |
|---|---|---|---|---|
| CTKAlg + UMFPACK | 2 | 1.50e-07 | DIP=5e+09, POP=2.5e+06 | ✓ |
| CTKAlg + KLU | 5 | 1.50e-07 | DIP=5e+09, POP=2.5e+06 | ✓ |
| NewtonRaphson + UMFPACK | 4 | 1.81e-10 | DIP=3.8e+12, POP=1.9e+09 | ✓ |
| NewtonRaphson + KLU | 13 | 1.81e-10 | DIP=3.8e+12, POP=1.9e+09 | ✓ |
| NLsolveJL + Newton/UMFPACK | 4 | 1.81e-10 | DIP=3.8e+12, POP=1.9e+09 | ✓ |
| NewtonRaphson + MKLPardisoFactorize | 5 | 1.81e-10 | DIP=3.8e+12, POP=1.9e+09 | ✓ |
| NewtonRaphson + MKLPardisoIterate | 5 | 1.81e-10 | DIP=3.8e+12, POP=1.9e+09 | ✓ |
| CTKAlg + MKLPardisoFactorize | 2 | 1.50e-07 | DIP=5e+09, POP=2.5e+06 | ✓ |
| CTKAlg + MKLPardisoIterate | 2 | 1.50e-07 | DIP=5e+09, POP=2.5e+06 | ✓ |
OCIM0 / idealage (n = 191169)
| Algorithm | Time (s) | Max rel drift | τ (yr) | Converged |
|---|---|---|---|---|
| CTKAlg + UMFPACK | 4 | 5.68e-10 | age=1.3e+14 | ✓ |
| CTKAlg + KLU | 51 | 1.06e-09 | age=8.4e+13 | ✓ |
| NewtonRaphson + UMFPACK | 5 | 5.68e-10 | age=1.3e+14 | ✓ |
| NewtonRaphson + KLU | 51 | 1.06e-09 | age=8.4e+13 | ✓ |
| NLsolveJL + Newton/UMFPACK | 4 | 5.68e-10 | age=1.3e+14 | ✓ |
| NewtonRaphson + MKLPardisoFactorize | 9 | 3.51e-11 | age=1.2e+15 | ✓ |
| NewtonRaphson + MKLPardisoIterate | 10 | 3.08e-11 | age=1.3e+15 | ✓ |
| CTKAlg + MKLPardisoFactorize | 5 | 3.51e-11 | age=1.2e+15 | ✓ |
| CTKAlg + MKLPardisoIterate | 5 | 3.08e-11 | age=1.3e+15 | ✓ |
OCIM0 / radiocarbon (n = 191169)
| Algorithm | Time (s) | Max rel drift | τ (yr) | Converged |
|---|---|---|---|---|
| CTKAlg + UMFPACK | 4 | 3.13e-09 | R=8.7e+13 | ✓ |
| CTKAlg + KLU | 52 | 1.14e-08 | R=5.5e+13 | ✓ |
| NewtonRaphson + UMFPACK | 5 | 3.13e-09 | R=8.7e+13 | ✓ |
| NewtonRaphson + KLU | 52 | 1.14e-08 | R=5.5e+13 | ✓ |
| NLsolveJL + Newton/UMFPACK | 4 | 3.13e-09 | R=8.7e+13 | ✓ |
| NewtonRaphson + MKLPardisoFactorize | 10 | 1.06e-09 | R=4.3e+14 | ✓ |
| NewtonRaphson + MKLPardisoIterate | 10 | 1.02e-09 | R=4.3e+14 | ✓ |
| CTKAlg + MKLPardisoFactorize | 5 | 1.06e-09 | R=4.3e+14 | ✓ |
| CTKAlg + MKLPardisoIterate | 5 | 1.02e-09 | R=4.3e+14 | ✓ |
OCIM0 / po4pop (n = 382338)
| Algorithm | Time (s) | Max rel drift | τ (yr) | Converged |
|---|---|---|---|---|
| CTKAlg + UMFPACK | 7 | 5.73e-09 | DIP=3.7e+10, POP=2.9e+07 | ✓ |
| CTKAlg + KLU | 50 | 5.73e-09 | DIP=3.7e+10, POP=2.9e+07 | ✓ |
| NewtonRaphson + UMFPACK | 11 | 1.02e-08 | DIP=2.5e+10, POP=2e+07 | ✓ |
| NewtonRaphson + KLU | 98 | 1.02e-08 | DIP=2.5e+10, POP=2e+07 | ✓ |
| NLsolveJL + Newton/UMFPACK | 13 | 1.02e-08 | DIP=2.5e+10, POP=2e+07 | ✓ |
| NewtonRaphson + MKLPardisoFactorize | 26 | 1.01e-07 | DIP=3.9e+09, POP=2.7e+10 | ✓ |
| NewtonRaphson + MKLPardisoIterate | 19 | 2.42e-07 | DIP=1.9e+09, POP=2e+07 | ✓ |
| CTKAlg + MKLPardisoFactorize | 10 | 7.90e-07 | DIP=5.3e+08, POP=9.4e+06 | ✓ |
| CTKAlg + MKLPardisoIterate | 11 | 5.73e-09 | DIP=3.7e+10, POP=2.9e+07 | ✓ |
Last updated: 2026-05-15T13:19:45.584, commit ca7453fe66a0f4ed8dc40b258a9653f0d2d89842
GPU execution (untested)
The underlying call chain — SteadyStateProblem → AIBECS state function → LinearSolve factorisation — has no inherently CPU-only piece. Moving the sparse Jacobian and state vector to a GPU array type (via e.g. CUDA.jl) and using a GPU-aware factorisation from LinearSolve should in principle just work, but we have not exercised this path. Anyone interested in a GPU run should expect to discover and fix a few rough edges on the way.
Krylov solvers
Iterative Krylov backends from LinearSolve (e.g. KrylovJL_GMRES, KrylovJL_BICGSTAB) are intentionally not in the supported matrix above: AIBECS does not ship a preconditioner suited to the steady-state advection–diffusion–reaction system, and unpreconditioned Krylov on these operators is uncompetitive with the sparse direct factorisations. Adding a preconditioner is a future direction.