Steady-state solvers

Finding a steady state of runs an outer nonlinear solver that calls an inner linear solver at every Jacobian-update / Newton-step iteration. AIBECS lets you pick each component independently:

The outer nonlinear solver can be any algorithm from NonlinearSolve.jl (NewtonRaphson, TrustRegion, LimitedMemoryBroyden, …), or AIBECS's own CTKAlg — a bespoke Newton–Chord–Shamanskii method adapted from C. T. Kelley's MATLAB implementations (Kelley, 2003) and specialised to the AIBECS state Jacobian.
The inner linear solver can be any factorisation from LinearSolve.jl — usually a sparse direct factorisation (UMFPACKFactorization, KLUFactorization, MKLPardisoFactorize).

The NonlinearSolve how-to walks through the call pattern for the SciML side. For CTKAlg the call is just solve(prob, CTKAlg()).

Recommended pairing: `CTKAlg() + UMFPACKFactorization()`

Calling CTKAlg() with no arguments yields CTKAlg(linsolve = UMFPACKFactorization()) — the AIBECS-recommended default. Two reasons it's the recommendation:

CTKAlg uses a Shamanskii update. The Newton iteration only refreshes the Jacobian when the residual stops shrinking fast enough (r > rSham₀ = 0.5). On AIBECS systems where the steady-state Jacobian changes slowly between iterations, one factorisation typically covers several Newton steps — generally faster in our experience than the "refresh every iteration" baseline that NewtonRaphson uses. The tradeoff is robustness: when the Jacobian does drift, Shamanskii's stale factors can take an extra Newton step to recover. Armijo line search is the safety net.
UMFPACKFactorization reuses its factors. CTKAlg runs the linear solve through a LinearSolve.LinearCache, so when Shamanskii recycles the Jacobian, UMFPACK's LU factors are reused directly — no refactorisation. Pairing CTKAlg with a factorisation that supports cache reuse is what makes the Shamanskii win compound.

If CTKAlg fails to converge on a pathological problem, the standard fallback is NewtonRaphson (still with UMFPACK) via the NonlinearSolve route below — fresh Jacobian every iteration, slower but more forgiving. More robust schemes still — for example pseudo-transient continuation (NonlinearSolve.PseudoTransient), which embeds the steady-state problem in a fictitious time evolution and is known to help on stiff systems — may also be worth trying; we have not benchmarked them on AIBECS systems.

When to use the NonlinearSolve route instead

You want a different outer algorithm (TrustRegion, LimitedMemoryBroyden, …).
You want to benchmark several algorithms against CTKAlg on your own circulation.
You need NonlinearSolve features that CTKAlg does not surface (custom termination modes, callbacks, sensitivity hooks).

The cost is the load-time of NonlinearSolve.jl and LinearSolve.jl plus their transitive deps — neither is a hard AIBECS dependency. The underlying SteadyStateProblem and sparse Jacobian are reused across both routes; switching is a one-line change at the call site.

Verified algorithm/factorisation pairs

The combinations below have been checked on the toy Primeau_2x2x2 circulation and converge to the same root as CTKAlg().

Solver call	Match vs `CTKAlg()`	Notes
`solve(prob, CTKAlg())`	(baseline)	Native AIBECS Newton–Chord–Shamanskii; CTKAlg + UMFPACK default.
`solve(nlprob, NewtonRaphson(linsolve = UMFPACKFactorization()))`	exact	Recommended NonlinearSolve fallback.
`solve(nlprob, NewtonRaphson(linsolve = KLUFactorization()))`	1e-10	KLU may win on very-sparse Jacobians.
`solve(nlprob, AIBECS.recommended_nlalg())`	exact	Convenience for `NewtonRaphson + UMFPACK`.

For the NonlinearSolve rows, nlprob = AIBECS.nonlinearproblem(prob) — see the how-to for the full pattern. The helper attaches the sparse Jacobian buffer type so LinearSolve's UMFPACK / KLU factorisations allocate sparse buffers instead of erroring on a dense default.

Latest benchmark results

Auto-generated from benchmark/solvers.jl; the timings below are wall clock for one steady-state solve on each circulation/tracer pair, on the maintainer's laptop. The small-tier suite covers OCCA and OCIM0 — enough for relative comparisons across the available algorithm/factorisation pairs without needing a workstation. (Primeau_2x2x2 is used elsewhere as a smoke-test circulation but is too small to time meaningfully, so it's not in the benchmark tier.)

OCCA / idealage (n = 84661)

Algorithm	Time (s)	Max rel drift	τ (yr)	Converged
CTKAlg + UMFPACK	1	4.07e-11	age=8.3e+14	✓
CTKAlg + KLU	5	6.82e-11	age=5.3e+14	✓
NewtonRaphson + UMFPACK	1	4.07e-11	age=8.3e+14	✓
NewtonRaphson + KLU	4	6.82e-11	age=5.3e+14	✓
NLsolveJL + Newton/UMFPACK	1	4.07e-11	age=8.3e+14	✓
NewtonRaphson + MKLPardisoFactorize	0.8	7.28e-12	age=2.6e+15	✓
NewtonRaphson + MKLPardisoIterate	0.77	7.28e-12	age=2.6e+15	✓
CTKAlg + MKLPardisoFactorize	1	7.28e-12	age=2.6e+15	✓
CTKAlg + MKLPardisoIterate	0.75	7.28e-12	age=2.6e+15	✓

OCCA / radiocarbon (n = 84661)

Algorithm	Time (s)	Max rel drift	τ (yr)	Converged
CTKAlg + UMFPACK	0.77	4.10e-08	R=2.2e+13	✓
CTKAlg + KLU	5	7.36e-08	R=1.7e+13	✓
NewtonRaphson + UMFPACK	1	4.10e-08	R=2.2e+13	✓
NewtonRaphson + KLU	5	7.36e-08	R=1.7e+13	✓
NLsolveJL + Newton/UMFPACK	0.74	4.10e-08	R=2.2e+13	✓
NewtonRaphson + MKLPardisoFactorize	0.78	2.24e-08	R=3.9e+13	✓
NewtonRaphson + MKLPardisoIterate	0.83	2.24e-08	R=3.9e+13	✓
CTKAlg + MKLPardisoFactorize	0.77	2.24e-08	R=3.9e+13	✓
CTKAlg + MKLPardisoIterate	0.77	2.24e-08	R=3.9e+13	✓

OCCA / po4pop (n = 169322)

Algorithm	Time (s)	Max rel drift	τ (yr)	Converged
CTKAlg + UMFPACK	2	1.50e-07	DIP=5e+09, POP=2.5e+06	✓
CTKAlg + KLU	5	1.50e-07	DIP=5e+09, POP=2.5e+06	✓
NewtonRaphson + UMFPACK	4	1.81e-10	DIP=3.8e+12, POP=1.9e+09	✓
NewtonRaphson + KLU	13	1.81e-10	DIP=3.8e+12, POP=1.9e+09	✓
NLsolveJL + Newton/UMFPACK	4	1.81e-10	DIP=3.8e+12, POP=1.9e+09	✓
NewtonRaphson + MKLPardisoFactorize	5	1.81e-10	DIP=3.8e+12, POP=1.9e+09	✓
NewtonRaphson + MKLPardisoIterate	5	1.81e-10	DIP=3.8e+12, POP=1.9e+09	✓
CTKAlg + MKLPardisoFactorize	2	1.50e-07	DIP=5e+09, POP=2.5e+06	✓
CTKAlg + MKLPardisoIterate	2	1.50e-07	DIP=5e+09, POP=2.5e+06	✓

OCIM0 / idealage (n = 191169)

Algorithm	Time (s)	Max rel drift	τ (yr)	Converged
CTKAlg + UMFPACK	4	5.68e-10	age=1.3e+14	✓
CTKAlg + KLU	51	1.06e-09	age=8.4e+13	✓
NewtonRaphson + UMFPACK	5	5.68e-10	age=1.3e+14	✓
NewtonRaphson + KLU	51	1.06e-09	age=8.4e+13	✓
NLsolveJL + Newton/UMFPACK	4	5.68e-10	age=1.3e+14	✓
NewtonRaphson + MKLPardisoFactorize	9	3.51e-11	age=1.2e+15	✓
NewtonRaphson + MKLPardisoIterate	10	3.08e-11	age=1.3e+15	✓
CTKAlg + MKLPardisoFactorize	5	3.51e-11	age=1.2e+15	✓
CTKAlg + MKLPardisoIterate	5	3.08e-11	age=1.3e+15	✓

OCIM0 / radiocarbon (n = 191169)

Algorithm	Time (s)	Max rel drift	τ (yr)	Converged
CTKAlg + UMFPACK	4	3.13e-09	R=8.7e+13	✓
CTKAlg + KLU	52	1.14e-08	R=5.5e+13	✓
NewtonRaphson + UMFPACK	5	3.13e-09	R=8.7e+13	✓
NewtonRaphson + KLU	52	1.14e-08	R=5.5e+13	✓
NLsolveJL + Newton/UMFPACK	4	3.13e-09	R=8.7e+13	✓
NewtonRaphson + MKLPardisoFactorize	10	1.06e-09	R=4.3e+14	✓
NewtonRaphson + MKLPardisoIterate	10	1.02e-09	R=4.3e+14	✓
CTKAlg + MKLPardisoFactorize	5	1.06e-09	R=4.3e+14	✓
CTKAlg + MKLPardisoIterate	5	1.02e-09	R=4.3e+14	✓

OCIM0 / po4pop (n = 382338)

Algorithm	Time (s)	Max rel drift	τ (yr)	Converged
CTKAlg + UMFPACK	7	5.73e-09	DIP=3.7e+10, POP=2.9e+07	✓
CTKAlg + KLU	50	5.73e-09	DIP=3.7e+10, POP=2.9e+07	✓
NewtonRaphson + UMFPACK	11	1.02e-08	DIP=2.5e+10, POP=2e+07	✓
NewtonRaphson + KLU	98	1.02e-08	DIP=2.5e+10, POP=2e+07	✓
NLsolveJL + Newton/UMFPACK	13	1.02e-08	DIP=2.5e+10, POP=2e+07	✓
NewtonRaphson + MKLPardisoFactorize	26	1.01e-07	DIP=3.9e+09, POP=2.7e+10	✓
NewtonRaphson + MKLPardisoIterate	19	2.42e-07	DIP=1.9e+09, POP=2e+07	✓
CTKAlg + MKLPardisoFactorize	10	7.90e-07	DIP=5.3e+08, POP=9.4e+06	✓
CTKAlg + MKLPardisoIterate	11	5.73e-09	DIP=3.7e+10, POP=2.9e+07	✓

Last updated: 2026-05-15T13:19:45.584, commit ca7453fe66a0f4ed8dc40b258a9653f0d2d89842

GPU execution (untested)

The underlying call chain — SteadyStateProblem → AIBECS state function → LinearSolve factorisation — has no inherently CPU-only piece. Moving the sparse Jacobian and state vector to a GPU array type (via e.g. CUDA.jl) and using a GPU-aware factorisation from LinearSolve should in principle just work, but we have not exercised this path. Anyone interested in a GPU run should expect to discover and fix a few rough edges on the way.

Krylov solvers

Iterative Krylov backends from LinearSolve (e.g. KrylovJL_GMRES, KrylovJL_BICGSTAB) are intentionally not in the supported matrix above: AIBECS does not ship a preconditioner suited to the steady-state advection–diffusion–reaction system, and unpreconditioned Krylov on these operators is uncompetitive with the sparse direct factorisations. Adding a preconditioner is a future direction.

Steady-state solvers ​

Recommended pairing: CTKAlg() + UMFPACKFactorization() ​

When to use the NonlinearSolve route instead ​

Verified algorithm/factorisation pairs ​

Latest benchmark results ​

GPU execution (untested) ​