Reproducing DInf-Grid Benchmarks in 10 Steps
When convergence plots and Pareto curves disagree across papers, itâs almost always a protocol problem, not a model breakthrough. The DInf-Grid protocol tackles this head-on by standardizing refinement ladders, error norms, and cost reporting across classical and learned solvers for ODEs and PDEs on structured grids. It focuses on empirical order-of-convergence (EOC), accuracyâcost trade-offs, and stabilityâall reproducibly, across multiple seeds and hardware settings. This article is a practitionerâs playbook: 10 concrete steps, tooling choices, pitfalls to avoid, and what to package so your convergence and Pareto plots are trustworthy and comparable.
Youâll learn how to: (1) pick representative problems and ground truths; (2) align reference solvers and tolerances to the physics; (3) build spatial, temporal, and spectral refinement ladders; (4) implement classical baselines for soundness checks; (5) configure learned methods with clear training regimes; (6) design controlled sweeps and ablations; (7) compute errors and EOC consistently; (8) profile cost and assemble Pareto frontiers; (9) stress-test stability and generalization; and (10) package artifacts for external verification. Tooling spans DifferentialEquations.jl, SUNDIALS, PETSc TS, FEniCS/deal.II, Clawpack, Dedalus, torchdiffeq/Diffrax, DeepXDE/NeuralPDE.jl, FNO/DeepONet/PINO, torchsde, PDEBench, and JAX-CFD.
Architecture/Implementation Details
Step 1 â Select representative problems and define ground truth
Map your application to canonical families: nonstiff or stiff ODEs, steady elliptic PDEs, diffusive parabolic equations, and hyperbolic transport with shocks. Fix domains, coefficients, and initial/boundary conditions. Include both smooth cases and known challenges (e.g., stiff layers, discontinuities) and write precise specs in configuration files to eliminate ambiguity downstream. Use PDEBench and JAX-CFD where relevant to standardize periodic flows and datasets.
Examples per the protocol include: harmonic oscillator and Lorenz-63/96 (nonstiff), Van der Pol (ÎŒ â« 1), Robertson, HIRES (stiff), Poisson/Darcy (elliptic), heat/AllenâCahn/CahnâHilliard (parabolic), and advection/Burgers/shallow water/Euler (hyperbolic) with Dirichlet, Neumann, and periodic boundaries represented.
Step 2 â Establish reference solvers and tolerances
Pick trusted methods aligned with the physics: implicit multistep or collocation for stiff ODEs (e.g., BDF/Radau/SDIRK via SUNDIALS, DifferentialEquations.jl), multigrid-accelerated FEM for elliptic solves (FEniCS or deal.II with HYPRE), stable finite-volume schemes for hyperbolic PDEs (WENO + SSP RungeâKutta in Clawpack), and spectral stacks for periodic flows (Dedalus). Run references at tight tolerances or very fine resolutions and archive solver settings: nonlinear/linear solve thresholds, preconditioners, and mesh/time resolutions.
Step 3 â Build refinement ladders
For structured grids, halve h at each spatial level; for spectral, double modes with consistent de-aliasing; for time, build fixed dt ladders for uniform schemes and tolerance ladders for adaptive solvers. For explicit PDE integrators, scale dt with h to maintain a stable Courant number; fix a common final time for ODEs and time-dependent PDEs to ensure apples-to-apples comparisons. For adaptive ODE solvers, use tolerance ladders (e.g., 1eâ2âŠ1eâ8), and log realized step sizes and counts.
Step 4 â Implement classical baselines
Before introducing learning, validate the environment: explicit RungeâKutta and adaptive pairs for nonstiff ODEs; BDF, Radau, and IMEX for stiff ODEs; FEM/FV/FD or spectral for PDEs; WENO with SSP time integrators for hyperbolic problems. Check that convergence slopes in smooth regimes match formal orders and degrade near shocks as theory predicts.
Step 5 â Configure learned methods with clear training regimes
- Neural ODEs: define vector-field architectures, training trajectories, and integration backend (adaptive/fixed), using torchdiffeq or Diffrax.
- PINNs: specify collocation sampling, boundary enforcement, and quadrature via DeepXDE or NeuralPDE.jl.
- Neural operators: set input/output resolutions, spectral padding, anti-aliasing rules, and physics regularization for FNO/DeepONet/PINO.
- Learned time-steppers: design trainable corrections/fluxes consistent with the host scheme.
- SDEs: choose strong/weak objectives and discretizations with torchsde. Document model sizes, regularization, optimizers, and schedulers in the same configuration style used for baselines.
Step 6 â Run controlled sweeps and ablations
Stratify by discretization (uniform vs adaptive, explicit vs implicit), data regime (supervised vs physics-informed), and model/training factors (depth/width, residual weighting, optimizer). Treat training budget as a first-class variableâlog GPU-hours and wall-clock. Apply consistent early stopping criteria across runs; consider curriculum and domain decomposition for higher dimensions in physics-informed setups.
Step 7 â Measure errors with consistent norms and compute convergence
Compute errors on the same grid as the references. For PDEs, report discrete L2 and Lâ (and relative versions when appropriate); for parabolic problems, include terminal-time and time-averaged errors; for hyperbolic, complement smooth-regime EOC with shock-time error and total variation. For ODEs, include terminal-state error and optionally trajectory MSE on a checkpoint grid. Estimate EOC via pÌ = log(E(h)/E(h/2))/log(2), with bootstrap confidence intervals across â„5 seeds for learned solvers.
Step 8 â Profile cost and build Pareto frontiers
Decompose training vs inference cost. Use consistent FLOPs/memory profilers (ptflops, fvcore) and synchronized timing with warm-ups. For adaptive classical codes, log accepted steps and nonlinear/linear iterations; for learned models, record FLOPs per forward pass and peak memory. Plot error versus cost with separate views for inference-only and total (training + inference). Add matched-accuracy plots for adaptive integrators to isolate the virtues of adaptivity.
Step 9 â Stress-test stability and generalization
Roll out beyond training windows to probe long-horizon behavior. Track invariants/energy drift for Hamiltonian dynamics and energy spectra/enstrophy for incompressible flows. For operator learners, evaluate at resolutions higher than training to map resolution generalization until error saturates. Vary boundaries (periodic â Dirichlet) to assess BC embedding sensitivity. Confirm learned time-stepper corrections diminish under refinement so the host schemeâs formal order is recovered.
Step 10 â Package reproducible artifacts and publish diagnostics
Consolidate configuration files, logs, trained checkpoints, raw outputs, and per-experiment metadata (hardware, software versions, seeds, exact solver settings). Curate convergence plots with slopes and confidence bands, accuracyâcost Pareto charts, and stability dashboards. Provide a short README per benchmark detailing problem definitions and rerun instructions. This enables external verification and downstream meta-analyses.
Comparison Tables
Tool stacks by equation class
| Equation class | Classical stack (baseline) | Learned stack (examples) |
|---|---|---|
| Nonstiff ODEs | Explicit RK (fixed/adaptive), DP/RK45 via DifferentialEquations.jl/SciPy | Neural ODEs with torchdiffeq/Diffrax |
| Stiff ODEs | BDF/Radau/SDIRK/IMEX via SUNDIALS, OrdinaryDiffEq.jl | Neural ODEs with implicit integrators |
| Elliptic PDEs | FEM (FEniCS/deal.II) with HYPRE multigrid | PINNs (DeepXDE/NeuralPDE.jl), DeepONet/FNO/PINO |
| Parabolic PDEs | PETSc TS (backward Euler, CN, BDF, IMEX), Dedalus for spectral | PINNs and neural operators |
| Hyperbolic PDEs | Clawpack WENO + SSP-RK | PINNs variants, FNO/PINO (careful with shocks) |
| Periodic flows | Dedalus spectral; JAX-CFD references | FNO/DeepONet/PINO |
| SDEs | EulerâMaruyama/Milstein baselines (scheme-dependent) | torchsde with strong/weak objectives |
Strengths and risks per solver family (for benchmarking)
| Solver family | Strengths (per protocol) | Common risks |
|---|---|---|
| Classical ODE/PDE | Formal orders in smooth regimes; robust stability | Cost per query higher than neural surrogates (specific metrics unavailable) |
| Neural ODEs | Continuous-time modeling; adaptive integration | Stiffness without implicit backends; model error can cap EOC |
| PINNs | Good on smooth elliptic/diffusive problems with proper weighting | Failure modes on stiff/hyperbolic unless stabilized |
| Neural operators | Strong resolution generalization; fast inference on periodic domains | Sensitive to anti-aliasing/padding and BC embeddings |
| Learned time-steppers | Can approach host scheme order if consistent | Out-of-distribution and long-horizon drift risks |
| Neural SDEs | Strong/weak order tied to discretization; efficient statistics | Training stability needs variance reduction (specific details not provided) |
Best Practices
- Standardize boundaries and padding: On periodic problems and operator-learning pipelines, ensure consistent spectral padding and de-aliasing to avoid artificial gains.
- Keep CFL consistent: For explicit PDE integrators, scale dt with h; otherwise, your refinement study mixes accuracy with stability effects.
- Align tolerances: For stiff systems, use matched implicit tolerances in training and evaluation; mismatches can create misleading EOC plateaus.
- Verify classical soundness first: Confirm baseline orders in smooth regimes before introducing learned models; degradation near shocks should align with theory.
- Report both inference-only and total cost: Neural surrogates shine in inference-heavy regimes, but training can dominate total cost; plot both Pareto views.
- Use consistent seeds and CIs: Run â„5 seeds for learned methods and bootstrap confidence intervals; repeat adaptive classical runs to account for solver stochasticity and scheduling variance.
- Document everything: Problem definitions, discretizations, solver thresholds, preconditioners, model sizes, optimizers, schedules, hardware, software versions, and random seeds belong in your configs and README.
Quality checklist before release:
- Baselines match expected orders in smooth regimes.
- Learned-model convergence fits include â„3 refinement levels with CIs.
- Pareto positions validated under inference-only and total cost.
- Long-horizon stability validated with physics-relevant diagnostics (e.g., spectra, enstrophy).
- Subset reproduced on a second machine or cloud (specific hardware steps not prescribed).
Pitfalls and safeguards:
- Inconsistent dealiasing/padding in spectral/operator-learning pipelines â standardize settings.
- Non-constant CFL during refinement â decouple stability from accuracy.
- Unbalanced residual vs boundary losses in PINNs â consider adaptive weighting/curriculum.
- Tolerance mismatches across training/eval on stiff systems â align or report both.
- Apparent gains that vanish at higher resolution or across seeds â verify robustness with refinement and multiple seeds.
Practical Examples
The protocol includes worked setups you can mirror end-to-end:
-
Lorenz-63 (nonstiff ODE): Set T=10 with fixed-step dt refinement from 1eâ2 to 1.25eâ3 and a tolerance ladder for adaptive RK45. Generate a high-accuracy reference with a high-order or tightly-toleranced solver. Train a Neural ODE on trajectories and, at each dt/tolerance, compute terminal-state and trajectory MSE errors, EOC, and step counts. Report inference wall-clock, FLOPs per step, and memory; run â„5 seeds and use bootstrap CIs.
-
Van der Pol (Ό=1000, stiff ODE): Use BDF/Radau references with tight tolerances via SUNDIALS or DifferentialEquations.jl. Compare against Neural ODEs integrated by implicit solvers (e.g., BDF in Diffrax). Sweep tolerances and report EOC in terminal-state error; include nonlinear iteration counts and stiffness indicators as available.
-
2D Poisson (elliptic): Choose a manufactured solution on [0,1]ÂČ with Dirichlet and Neumann cases. Run FEM (p=1/2) with h-halving and multigrid preconditioning (HYPRE). Train DeepONet and PINN variants. Compute L2/Lâ errors vs h and EOC; for PINNs, increase collocation density and quadrature accuracy; for operators, assess error vs evaluation resolution until saturation.
-
1D Burgers (hyperbolic): Run a smooth-regime case and a shock-forming case with periodic BCs. Use WENO5 + SSP-RK baselines via Clawpack and compute EOC in smooth windows; measure shock-time error and total variation. Evaluate FNO/PINO and PINNs for dispersion/Gibbs artifacts, apply anti-aliasing and padding consistently, and report stability/failure rates (specific rates unavailable).
-
2D NavierâStokes on a torus: Follow FNO/PDEBench configurations; train at 64ÂČ and test at 128ÂČ, 256ÂČ. Report error scaling, long-horizon drift, energy spectra, and enstrophy relative to JAX-CFD references.
-
2D Darcy with mixed BCs: Generate parametric permeability fields; run FEM baselines and train DeepONet/FNO on PDEBench splits. Evaluate resolution generalization and parameter shifts; report L2/Lâ and EOC under h-halving.
These examples exercise every component of the protocolârefinement ladders, solver selection, learned model configuration, measurement, cost profiling, and stability diagnosticsâwhile staying within widely used stacks that the community recognizes and can reproduce.
Conclusion
Reproducible convergence and fair Pareto comparisons donât happen by accident; theyâre the product of disciplined problem selection, reference-grade numerics, standardized refinement, consistent error norms, and meticulous cost accounting. The DInf-Grid approach codifies this for ODEs and PDEs on structured grids and provides a common stage on which classical and learned solvers can be measured with confidence. With the 10-step playbook aboveâgrounded in mature numerical libraries and widely adopted ML toolkitsâyou can build benchmarks that stand up to scrutiny and actually inform solver selection.
Key takeaways:
- Use physics-appropriate reference solvers and document every tolerance, preconditioner, and mesh/time resolution.
- Build refinement ladders and compute EOC with consistent norms and confidence intervals.
- Separate inference-only from total (training + inference) cost and plot both Pareto views.
- Stress-test for long horizons and resolution generalization, especially on periodic flows and operator learners.
- Package full artifactsâconfigs, logs, checkpoints, raw outputsâfor external verification.
Next steps: start with one problem per equation family relevant to your application; validate classical baselines and EOC first; then introduce learned models with clear training regimes and controlled ablations. Once your convergence and Pareto plots survive higher-resolution checks and multiple seeds, publish the full artifact set with a concise README and rerun instructions. Thatâs how your DInf-Grid runs will be both useful and trustworthy. â