Tag
Numerical Analysis
Numerical analysis studies the design, convergence, and stability of algorithms that approximate continuous mathematical objects with finite, computable structures. Classical methods (finite differences, finite elements, spectral methods) each preserve fragments of the underlying geometry by accident. Discrete exterior calculus (DEC) preserves it by construction, discretising differential forms on simplicial complexes so that topological identities like Stokes' theorem hold exactly at the discrete level. My work uses DEC through the cartan library for applications in fluid dynamics, electromagnetics, and quantum mechanics.
Blog
April 15, 2026
From RVE to Mesh: A Pipeline for Heterogeneous Continua
A single pipeline from microstructure to discrete solver: mean-field homogenisation on a representative volume element produces an SPD permeability tensor field, which induces a Riemannian metric, whose Hodge star discretises the Laplace-Beltrami operator, and whose scalar curvature drives adaptive remeshing.
April 4, 2026
Numerical Analysis via Discrete Exterior Calculus
A self-contained reconstruction of numerical analysis through discrete exterior calculus: simplicial complexes, cochains, the discrete Hodge star, and the Hodge Laplacian, applied to quantum mechanics, computational electromagnetics, and fluid dynamics.
Engineering
ferrum-gpu
Pure-Rust GPU compute substrate with Python bindings. FFT kernels compile from Rust source straight to PTX via cuda-oxide (no CUDA C in the build) and run on NVIDIA GPUs today; cross-vendor support through spirv-oxide and Vulkan is the v0.2 roadmap. A `no_std` `Backend` trait, typed `Device<B>` and `Buffer<T, B>` facades, and 1D/2D radix-2 Stockham C2C kernels cross-validated against `numpy.fft` across 29 GPU integration tests within 1e-3 to 1e-4 relative error. Published on PyPI.
gpufft
Cross-vendor GPU FFT for Rust, backed by VkFFT on Vulkan and cuFFT on CUDA. A single trait surface runs identically on NVIDIA, AMD, and Intel; buffers and plans are typed at the backend-and-scalar level, so mixing a Vulkan buffer with a CUDA plan, or an `f32` plan with a `Complex64` buffer, is a compile error. `plan_c2c`, `plan_r2c`, and `plan_c2r` at `f32` and `f64` over 1D, 2D, and 3D. Ships a dual-backend manylinux Python wheel cross-validated against `numpy`. Sibling to ferrum-gpu, sharing its FFT API.
Elworthy
JIT compiler that specialises Bismut-Elworthy-Li formulas into SIMD kernels for unbiased Monte Carlo Greeks on non-stationary SDEs. Symbolic AST, Cranelift lowering (scalar and 2-lane F64X2), multi-dimensional Heston driver, pathwise and likelihood-ratio Malliavin parameter Greeks (machine-checked with SymPy). European call price and BEL delta cross-validated against Black-Scholes closed form and the independent blackscholes crate; both agree within four Monte Carlo standard errors. About 22x over a tree-walking interpreter on GBM paths.
Kloeden
Hand-written SIMD C++ vs Rust (LLVM + Cranelift) benchmark companion to pathwise and elworthy. Same Brownian-increment fixture across four impls; single-thread pinned-core throughput on scalar Euler / Milstein / Taylor 1.5 on GBM, plus a digital-delta correctness table showing naive pathwise silently returns 0 in both languages while the Bismut-Elworthy-Li constant-flow weight matches analytic within 4 Monte Carlo standard errors (bitwise-identical between hand-rolled C++, hand-rolled Rust, and elworthy_rt::from_paths). Named after Peter Kloeden.
cuda-oxide #117
Merged upstream PR to NVlabs/cuda-oxide. feat(codegen): fma contraction and an opt -O3 pass to match nvcc defaults. Merged 2026-06-18T05:37:45Z.
cuda-oxide #256
Merged upstream PR to NVlabs/cuda-oxide. feat(cargo-oxide): add `emit-ltoir` to build a crate's LTOIR in one step. Merged 2026-06-20T12:56:37Z.
cuda-oxide #257
Merged upstream PR to NVlabs/cuda-oxide. fix(cargo-oxide): rebuild cached backend when its source advances. Merged 2026-06-20T12:57:09Z.