Reproducing DInf-Grid Benchmarks in 10 Steps

When convergence plots and Pareto curves disagree across papers, it’s almost always a protocol problem, not a model breakthrough. The DInf-Grid protocol tackles this head-on by standardizing refinement ladders, error norms, and cost reporting across classical and learned solvers for ODEs and PDEs on structured grids. It focuses on empirical order-of-convergence (EOC), accuracy–cost trade-offs, and stability—all reproducibly, across multiple seeds and hardware settings. This article is a practitioner’s playbook: 10 concrete steps, tooling choices, pitfalls to avoid, and what to package so your convergence and Pareto plots are trustworthy and comparable.

You’ll learn how to: (1) pick representative problems and ground truths; (2) align reference solvers and tolerances to the physics; (3) build spatial, temporal, and spectral refinement ladders; (4) implement classical baselines for soundness checks; (5) configure learned methods with clear training regimes; (6) design controlled sweeps and ablations; (7) compute errors and EOC consistently; (8) profile cost and assemble Pareto frontiers; (9) stress-test stability and generalization; and (10) package artifacts for external verification. Tooling spans DifferentialEquations.jl, SUNDIALS, PETSc TS, FEniCS/deal.II, Clawpack, Dedalus, torchdiffeq/Diffrax, DeepXDE/NeuralPDE.jl, FNO/DeepONet/PINO, torchsde, PDEBench, and JAX-CFD.

Architecture/Implementation Details

Step 1 — Select representative problems and define ground truth

Map your application to canonical families: nonstiff or stiff ODEs, steady elliptic PDEs, diffusive parabolic equations, and hyperbolic transport with shocks. Fix domains, coefficients, and initial/boundary conditions. Include both smooth cases and known challenges (e.g., stiff layers, discontinuities) and write precise specs in configuration files to eliminate ambiguity downstream. Use PDEBench and JAX-CFD where relevant to standardize periodic flows and datasets.

Examples per the protocol include: harmonic oscillator and Lorenz-63/96 (nonstiff), Van der Pol (μ ≫ 1), Robertson, HIRES (stiff), Poisson/Darcy (elliptic), heat/Allen–Cahn/Cahn–Hilliard (parabolic), and advection/Burgers/shallow water/Euler (hyperbolic) with Dirichlet, Neumann, and periodic boundaries represented.

Step 2 — Establish reference solvers and tolerances

Pick trusted methods aligned with the physics: implicit multistep or collocation for stiff ODEs (e.g., BDF/Radau/SDIRK via SUNDIALS, DifferentialEquations.jl), multigrid-accelerated FEM for elliptic solves (FEniCS or deal.II with HYPRE), stable finite-volume schemes for hyperbolic PDEs (WENO + SSP Runge–Kutta in Clawpack), and spectral stacks for periodic flows (Dedalus). Run references at tight tolerances or very fine resolutions and archive solver settings: nonlinear/linear solve thresholds, preconditioners, and mesh/time resolutions.

For structured grids, halve h at each spatial level; for spectral, double modes with consistent de-aliasing; for time, build fixed dt ladders for uniform schemes and tolerance ladders for adaptive solvers. For explicit PDE integrators, scale dt with h to maintain a stable Courant number; fix a common final time for ODEs and time-dependent PDEs to ensure apples-to-apples comparisons. For adaptive ODE solvers, use tolerance ladders (e.g., 1e−2…1e−8), and log realized step sizes and counts.

Step 4 — Implement classical baselines

Before introducing learning, validate the environment: explicit Runge–Kutta and adaptive pairs for nonstiff ODEs; BDF, Radau, and IMEX for stiff ODEs; FEM/FV/FD or spectral for PDEs; WENO with SSP time integrators for hyperbolic problems. Check that convergence slopes in smooth regimes match formal orders and degrade near shocks as theory predicts.

Step 5 — Configure learned methods with clear training regimes

Neural ODEs: define vector-field architectures, training trajectories, and integration backend (adaptive/fixed), using torchdiffeq or Diffrax.
PINNs: specify collocation sampling, boundary enforcement, and quadrature via DeepXDE or NeuralPDE.jl.
Neural operators: set input/output resolutions, spectral padding, anti-aliasing rules, and physics regularization for FNO/DeepONet/PINO.
Learned time-steppers: design trainable corrections/fluxes consistent with the host scheme.
SDEs: choose strong/weak objectives and discretizations with torchsde. Document model sizes, regularization, optimizers, and schedulers in the same configuration style used for baselines.

Step 6 — Run controlled sweeps and ablations

Stratify by discretization (uniform vs adaptive, explicit vs implicit), data regime (supervised vs physics-informed), and model/training factors (depth/width, residual weighting, optimizer). Treat training budget as a first-class variable—log GPU-hours and wall-clock. Apply consistent early stopping criteria across runs; consider curriculum and domain decomposition for higher dimensions in physics-informed setups.

Step 7 — Measure errors with consistent norms and compute convergence

Compute errors on the same grid as the references. For PDEs, report discrete L2 and L∞ (and relative versions when appropriate); for parabolic problems, include terminal-time and time-averaged errors; for hyperbolic, complement smooth-regime EOC with shock-time error and total variation. For ODEs, include terminal-state error and optionally trajectory MSE on a checkpoint grid. Estimate EOC via p̂ = log(E(h)/E(h/2))/log(2), with bootstrap confidence intervals across ≥5 seeds for learned solvers.

Step 8 — Profile cost and build Pareto frontiers

Decompose training vs inference cost. Use consistent FLOPs/memory profilers (ptflops, fvcore) and synchronized timing with warm-ups. For adaptive classical codes, log accepted steps and nonlinear/linear iterations; for learned models, record FLOPs per forward pass and peak memory. Plot error versus cost with separate views for inference-only and total (training + inference). Add matched-accuracy plots for adaptive integrators to isolate the virtues of adaptivity.

Step 9 — Stress-test stability and generalization

Roll out beyond training windows to probe long-horizon behavior. Track invariants/energy drift for Hamiltonian dynamics and energy spectra/enstrophy for incompressible flows. For operator learners, evaluate at resolutions higher than training to map resolution generalization until error saturates. Vary boundaries (periodic ↔ Dirichlet) to assess BC embedding sensitivity. Confirm learned time-stepper corrections diminish under refinement so the host scheme’s formal order is recovered.

Step 10 — Package reproducible artifacts and publish diagnostics

Consolidate configuration files, logs, trained checkpoints, raw outputs, and per-experiment metadata (hardware, software versions, seeds, exact solver settings). Curate convergence plots with slopes and confidence bands, accuracy–cost Pareto charts, and stability dashboards. Provide a short README per benchmark detailing problem definitions and rerun instructions. This enables external verification and downstream meta-analyses.

Comparison Tables

Tool stacks by equation class

Equation class	Classical stack (baseline)	Learned stack (examples)
Nonstiff ODEs	Explicit RK (fixed/adaptive), DP/RK45 via DifferentialEquations.jl/SciPy	Neural ODEs with torchdiffeq/Diffrax
Stiff ODEs	BDF/Radau/SDIRK/IMEX via SUNDIALS, OrdinaryDiffEq.jl	Neural ODEs with implicit integrators
Elliptic PDEs	FEM (FEniCS/deal.II) with HYPRE multigrid	PINNs (DeepXDE/NeuralPDE.jl), DeepONet/FNO/PINO
Parabolic PDEs	PETSc TS (backward Euler, CN, BDF, IMEX), Dedalus for spectral	PINNs and neural operators
Hyperbolic PDEs	Clawpack WENO + SSP-RK	PINNs variants, FNO/PINO (careful with shocks)
Periodic flows	Dedalus spectral; JAX-CFD references	FNO/DeepONet/PINO
SDEs	Euler–Maruyama/Milstein baselines (scheme-dependent)	torchsde with strong/weak objectives

Strengths and risks per solver family (for benchmarking)

Solver family	Strengths (per protocol)	Common risks
Classical ODE/PDE	Formal orders in smooth regimes; robust stability	Cost per query higher than neural surrogates (specific metrics unavailable)
Neural ODEs	Continuous-time modeling; adaptive integration	Stiffness without implicit backends; model error can cap EOC
PINNs	Good on smooth elliptic/diffusive problems with proper weighting	Failure modes on stiff/hyperbolic unless stabilized
Neural operators	Strong resolution generalization; fast inference on periodic domains	Sensitive to anti-aliasing/padding and BC embeddings
Learned time-steppers	Can approach host scheme order if consistent	Out-of-distribution and long-horizon drift risks
Neural SDEs	Strong/weak order tied to discretization; efficient statistics	Training stability needs variance reduction (specific details not provided)

Best Practices

Standardize boundaries and padding: On periodic problems and operator-learning pipelines, ensure consistent spectral padding and de-aliasing to avoid artificial gains.
Keep CFL consistent: For explicit PDE integrators, scale dt with h; otherwise, your refinement study mixes accuracy with stability effects.
Align tolerances: For stiff systems, use matched implicit tolerances in training and evaluation; mismatches can create misleading EOC plateaus.
Verify classical soundness first: Confirm baseline orders in smooth regimes before introducing learned models; degradation near shocks should align with theory.
Report both inference-only and total cost: Neural surrogates shine in inference-heavy regimes, but training can dominate total cost; plot both Pareto views.
Use consistent seeds and CIs: Run ≥5 seeds for learned methods and bootstrap confidence intervals; repeat adaptive classical runs to account for solver stochasticity and scheduling variance.
Document everything: Problem definitions, discretizations, solver thresholds, preconditioners, model sizes, optimizers, schedules, hardware, software versions, and random seeds belong in your configs and README.

Quality checklist before release:

Baselines match expected orders in smooth regimes.
Learned-model convergence fits include ≥3 refinement levels with CIs.
Pareto positions validated under inference-only and total cost.
Long-horizon stability validated with physics-relevant diagnostics (e.g., spectra, enstrophy).
Subset reproduced on a second machine or cloud (specific hardware steps not prescribed).

Pitfalls and safeguards:

Inconsistent dealiasing/padding in spectral/operator-learning pipelines → standardize settings.
Non-constant CFL during refinement → decouple stability from accuracy.
Unbalanced residual vs boundary losses in PINNs → consider adaptive weighting/curriculum.
Tolerance mismatches across training/eval on stiff systems → align or report both.
Apparent gains that vanish at higher resolution or across seeds → verify robustness with refinement and multiple seeds.

Practical Examples

The protocol includes worked setups you can mirror end-to-end:

Lorenz-63 (nonstiff ODE): Set T=10 with fixed-step dt refinement from 1e−2 to 1.25e−3 and a tolerance ladder for adaptive RK45. Generate a high-accuracy reference with a high-order or tightly-toleranced solver. Train a Neural ODE on trajectories and, at each dt/tolerance, compute terminal-state and trajectory MSE errors, EOC, and step counts. Report inference wall-clock, FLOPs per step, and memory; run ≥5 seeds and use bootstrap CIs.
Van der Pol (μ=1000, stiff ODE): Use BDF/Radau references with tight tolerances via SUNDIALS or DifferentialEquations.jl. Compare against Neural ODEs integrated by implicit solvers (e.g., BDF in Diffrax). Sweep tolerances and report EOC in terminal-state error; include nonlinear iteration counts and stiffness indicators as available.
2D Poisson (elliptic): Choose a manufactured solution on [0,1]² with Dirichlet and Neumann cases. Run FEM (p=1/2) with h-halving and multigrid preconditioning (HYPRE). Train DeepONet and PINN variants. Compute L2/L∞ errors vs h and EOC; for PINNs, increase collocation density and quadrature accuracy; for operators, assess error vs evaluation resolution until saturation.
1D Burgers (hyperbolic): Run a smooth-regime case and a shock-forming case with periodic BCs. Use WENO5 + SSP-RK baselines via Clawpack and compute EOC in smooth windows; measure shock-time error and total variation. Evaluate FNO/PINO and PINNs for dispersion/Gibbs artifacts, apply anti-aliasing and padding consistently, and report stability/failure rates (specific rates unavailable).
2D Navier–Stokes on a torus: Follow FNO/PDEBench configurations; train at 64² and test at 128², 256². Report error scaling, long-horizon drift, energy spectra, and enstrophy relative to JAX-CFD references.
2D Darcy with mixed BCs: Generate parametric permeability fields; run FEM baselines and train DeepONet/FNO on PDEBench splits. Evaluate resolution generalization and parameter shifts; report L2/L∞ and EOC under h-halving.

These examples exercise every component of the protocol—refinement ladders, solver selection, learned model configuration, measurement, cost profiling, and stability diagnostics—while staying within widely used stacks that the community recognizes and can reproduce.

Conclusion

Reproducible convergence and fair Pareto comparisons don’t happen by accident; they’re the product of disciplined problem selection, reference-grade numerics, standardized refinement, consistent error norms, and meticulous cost accounting. The DInf-Grid approach codifies this for ODEs and PDEs on structured grids and provides a common stage on which classical and learned solvers can be measured with confidence. With the 10-step playbook above—grounded in mature numerical libraries and widely adopted ML toolkits—you can build benchmarks that stand up to scrutiny and actually inform solver selection.

Key takeaways:

Use physics-appropriate reference solvers and document every tolerance, preconditioner, and mesh/time resolution.
Build refinement ladders and compute EOC with consistent norms and confidence intervals.
Separate inference-only from total (training + inference) cost and plot both Pareto views.
Stress-test for long horizons and resolution generalization, especially on periodic flows and operator learners.
Package full artifacts—configs, logs, checkpoints, raw outputs—for external verification.

Next steps: start with one problem per equation family relevant to your application; validate classical baselines and EOC first; then introduce learned models with clear training regimes and controlled ablations. Once your convergence and Pareto plots survive higher-resolution checks and multiple seeds, publish the full artifact set with a concise README and rerun instructions. That’s how your DInf-Grid runs will be both useful and trustworthy. ✅

Sources & References

DifferentialEquations.jl (SciML) Reference ODE solver stack used for classical baselines and stiff/nonstiff coverage, central to Steps 2 and 4.

SUNDIALS (CVODE/ARKODE/IDA) Trusted implicit BDF/Radau/IMEX references for stiff ODEs and PDE time stepping, critical for Step 2 and stiff benchmarks.

PETSc TS (time steppers for PDEs) IMEX and implicit time integration stack for parabolic PDEs and tolerance-controlled studies in Steps 2–3.

Clawpack (finite volume for hyperbolic PDEs) Baseline WENO + SSP-RK implementations for hyperbolic benchmarks used in Steps 2 and 4.

FEniCS (FEM) Elliptic/parabolic FEM baselines with multigrid preconditioning, essential for Step 2 and elliptic benchmarks.

deal.II (FEM library) Alternative FEM baseline for elliptic PDEs with h-refinement, supporting Steps 2–3.

Dedalus (spectral PDE solver) Spectral baseline for periodic flows and de-aliasing guidance used in Steps 2–3 and stability tests.

torchdiffeq (official code) Neural ODE training/integration backend referenced in Step 5 and worked ODE protocols.

Diffrax (JAX differential equation solvers) Implicit/explicit integrators for neural ODEs and stiff training regimes, key to Step 5.

DeepXDE (PINNs library) PINN implementation used to define collocation, boundary losses, and quadrature in Step 5.

NeuralPDE.jl (SciML) Physics-informed training stack for PDEs, used in Step 5 and PINN best practices.

Fourier Neural Operator for Parametric PDEs Neural operator benchmark and resolution generalization guidance used in Steps 5 and 9.

FNO official code Reference implementation for neural operators used in configuration and evaluation (Step 5, examples).

DeepONet (Nature Machine Intelligence 2021) Operator-learning baseline for elliptic/Darcy examples in the Practical Examples section.

Physics-Informed Neural Operator (PINO) Physics-regularized operator-learning method; important for periodic flows and hyperbolic cases (Steps 5 and 9).

torchsde (official code) Neural SDE training/integration used in Step 5 and SDE benchmark considerations.

HYPRE (multigrid preconditioners) Multigrid preconditioners paired with FEM baselines in Step 2 for elliptic/parabolic PDEs.

DiffEqDevTools.jl: Convergence Testing Defines empirical order-of-convergence estimation and bootstrap CIs used in Steps 3 and 7.

High-order WENO schemes (SIAM Review) Shock-capturing baselines and CFL considerations used in Steps 2–3 and hyperbolic checks.

Strong Stability Preserving Runge–Kutta and Multistep Methods Time integrator pairing with WENO for stable hyperbolic baselines in Steps 2 and 4.

Finite Volume Methods for Hyperbolic Problems (LeVeque) Theory for expected order degradation near discontinuities referenced in Step 4 and Best Practices.

PDEBench (paper) Datasets and generation pipelines for PDE benchmarks used in Step 1 and Practical Examples.

PDEBench (repo) Code and standardized splits for datasets referenced throughout examples and setup.

JAX-CFD (reference CFD in JAX) Reference periodic CFD solvers and data generation for periodic flow benchmarks used in Steps 1 and 9.

Solving Ordinary Differential Equations I (Hairer, Nørsett, Wanner) Classical ODE theory for expected orders and stiffness considerations used in Step 4 and Best Practices.

Geometric Numerical Integration (Hairer, Lubich, Wanner) Stability and invariant preservation diagnostics for long-horizon tests in Step 9.

SciPy solve_ivp Reference adaptive ODE solvers (e.g., RK45) used in nonstiff baselines and Practical Examples.

ptflops (FLOPs counter) FLOPs/memory profiling for accuracy–cost reporting in Step 8.

fvcore (FLOPs/memory utils) Alternative FLOPs/memory utilities to standardize cost measurements in Step 8.

DiffEqProblemLibrary.jl Standard ODE benchmark problems referenced for selecting representative tasks in Step 1.

Learning data-driven discretizations for PDEs (PNAS 2019) Guidance for learned time-steppers and consistency under refinement in Steps 5 and 9.

Physics-Informed Neural Networks (JCP 2019) Foundational PINN reference for collocation and boundary enforcement used in Step 5.

Characterizing possible failure modes in PINNs Evidence of optimization pathologies motivating safeguards and curricula in Best Practices.

Neural Ordinary Differential Equations Foundational neural ODE reference supporting the learned ODE setup in Step 5.