Research Note

Inverse Problems & Regularization

Zhenyu He · Jobs Stroustrup 3 min read

Definition

Inverse Problem: given a forward model $d = A x + ϵ$ (known observation/physics operator $A$ , data $d$ , unknown $x$ , noise $ϵ$ ), recover $x$ .

Ill-posed (Hadamard 1902): fails one of existence, uniqueness, or stability (small data perturbations yield small solution perturbations).

Discrete Inverse Problem: the discretization $A x = d$ often has a very large condition number — small data noise amplifies into large solution perturbations.

Regularization trades strict data fit for stability via a regularizer $R (x)$ and parameter $λ$ : $min_{x} ∥ A x - d ∥^{2} + λ R (x)$ Common choices: Tikhonov $∥ x ∥^{2}$ or $∥ Lx ∥^{2}$ , sparsity $∥ x ∥_{1}$ (compressive sensing).

Core Arguments: choosing $λ$

$λ$ too small → noise-dominated; too large → regularizer-dominated. Zhenyu compared four methods:

L-Curve: log-log plot of $∥ x_{λ} ∥$ vs $∥ A x_{λ} - d ∥$ ; the “corner” marks best $λ$ . Geometric intuition, needs no noise estimate; fragile when no clear corner exists.
L-Curve Curvature: numerical curvature $κ (λ)$ automates corner detection; sensitive to noise in the differentiation.
Generalized Cross Validation (GCV) [Golub, Heath, Wahba 1979]: $GCV (λ) = ∥ A x_{λ} - d ∥^{2} / [m - tr (I - H (λ))]^{2}$ ; closed-form approximation to leave-one-out CV. Statistically optimal for prediction MSE; may be flat; sometimes underregularizes.
Morozov Discrepancy Principle: pick $λ$ s.t. $∥ A x_{λ} - d ∥ = δ$ (noise level). Principled; requires known $δ$ .

Different Perspectives

Hansen (2010): L-Curve preferred when noise is unknown — strongest geometric intuition
Golub (1979): GCV statistically optimal under Gaussian noise
Zhenyu’s DSCOVR observation: L-Curve ≈ Curvature; GCV tends to underregularize on real data — no silver bullet, use multiple + check physical plausibility

Core Tools

SVD + Tikhonov closed form: $x_{λ} = \sum_{i} \frac{σ _{i}}{σ _{i}^{2} + λ} u_{i}^{T} d \cdot v_{i}$ — the filter-factor view reveals which singular directions are kept vs suppressed
Condition number analysis: $cond (A) = σ_{1} / σ_{n}$
Picard condition: check $∣ u_{i}^{T} d ∣$ decays faster than $σ_{i}$ down to noise — otherwise recovery is fundamentally hopeless
Filter factors: Tikhonov → smooth transition; TSVD → step (keep top $k$ singular components)

Open Questions

Nonlinear inverse problems: most real retrievals are $d = f (x) + ϵ$ with nonlinear $f$ — how do the four methods extend?
Bayesian unification: $λ$ ↔ prior precision; choosing $λ$ ↔ Type-II MLE/MAP — the tradeoff vs full MCMC over hyperparameters (see Gaussian Process Bayesian Inversion)?
Deep-learning “regularization” (dropout, weight decay, augmentation) — formal link to classical Tikhonov?

Applications

Used/planned: DSCOVR retrieval, MODIS×CALIPSO joint aerosol, nuclear winter soot profile retrieval, exoplanet spin-orbit tomography. Transferable to: medical imaging (CT, MRI), geophysical prospection, atmospheric trace gas retrieval (CH4, CO2), astronomical image reconstruction.

Sources

— code + comparison experiments + self-study notes
Undergraduate Research @ Peking University & Caltech/UCR — project background
Textbook (raw): Hansen Discrete Inverse Problems 2010
Original paper (raw): Golub, Heath, Wahba 1979 (GCV)
Lecture notes (raw): Morozov discrepancy principle
[Pending] Further .docx study notes from Zhenyu’s research-angle LLM Wiki, to be merged in later

Gaussian Process Bayesian Inversion — Bayesian dual of regularization
MCMC & Bayesian Inversion — skill page
Satellite Remote Sensing & Data Processing — application vehicle