Research Note

MCMC & Bayesian Inversion

Zhenyu He · Jobs Stroustrup 5 min read

Proficiency

Proficient

Description

MCMC sampling

emcee (affine-invariant ensemble sampler, Foreman-Mackey 2013) on real problems
Log-space reparameterization for scale-spanning parameters
Trace + corner plots for chain diagnostics
Practical handling of MCMC multi-process × BLAS thread collision (OMP_NUM_THREADS=1 + 7 other env vars)

Gaussian Process regression

RBF kernel + hierarchical MCMC sampling of hyperparameters
Spatial × Temporal separable kernel KS ⊗ KT to avoid O(N³) joint covariance
HEALPix spherical grid as GP spatial support (healpy)
Snapshot-level and pixelwise posterior covariances implemented in closed form

Regularization parameter selection for inverse problems

L-Curve method
L-Curve Curvature method
Generalized Cross Validation (GCV) — hand-implemented from Golub 1979
Morozov discrepancy principle
All four compared on Hansen gravity-problem example and on real DSCOVR data

Numerical linear algebra

scipy.linalg.solve(..., assume_a="pos") Cholesky instead of explicit inverse
np.linalg.slogdet vs log(det) for underflow safety
SVD + Tikhonov regularization (MATLAB csvd, etc.)
Matrix condition number analysis

Toolchain

Python: emcee, scipy, numpy, healpy, astropy, pandas
MATLAB: method verification + Hansen-book reproductions
Mathematica: symbolic derivations / probability formalism checks

Self-study path (worth highlighting)

In 2021 self-studied Hansen’s Discrete Inverse Problems + Bishop’s PRML + Golub 1979 original GCV paper, progressively reproducing textbook and original-paper numerical experiments — the “textbook → original paper → reproduction → applied to new problem” chain.

Used In

Undergraduate Research @ Peking University & Caltech/UCR — MCMC Retrieval Methods project (Mar 2021 — Mar 2022)
— full research-code + study-notes repo
Transferable to: any ill-posed inverse problem (climate retrieval, medical imaging, geophysical inversion, exoplanet spin-mapping)

Methodology Concept Pages

Inverse Problems & Regularization — frequentist view
Gaussian Process Bayesian Inversion — Bayesian view
Hardened Balancing Principle (HBP) — HBP automatic regularization parameter selection
L-Curve Axis-Scale Ambiguity (Hansen σ² vs σ⁴ bug catch) — Hansen σ²/σ⁴ 80× bug open question

Caltech DSCOVR 4 Original Contributions (2021-2022, Advised by Yuk Yung + King-Fai Li)

Contribution (c): Hansen Textbook σ² vs σ⁴ Bug Catch (80× Deviation)

Core achievement: As an undergraduate, using independent SVD + filter-factor derivation, found a formula-level bug in Hansen’s classic Discrete Inverse Problems (SIAM 2010) — a foundational textbook — not a typo, but a substantive error causing numerical results to differ by ~80×.

Numerical comparison (same toy model + PC2 data):

Formula	Toy model λ	PC2 λ	Accuracy
Hansen-student correct version (σ⁴)	0.0251 ✓	0.0575 ✓	Consistent with naked-eye elbow + theoretical value under axis-image
Hansen-book erroneous version (σ²)	1.995 ✗	1.995 ✗	80× overestimate, complete failure

Innovative discovery: independently identified an L-curve axis-scale ambiguity not discussed by Hansen:

“Hansen didn’t talk about whether we should control the units of x-axis and y-axis of L-curve to be the same. In his book, he showed two L-curve figures, but both didn’t control the units to be the same.” (discovered 2022-01-08)

3-step self-correction cycle (11/29 → 12/06 → 01/08): self-discovered the formula error → hypothesized accumulation of errors → refuted that and correctly attributed it to axis-scale ambiguity — demonstrates intellectual honesty + iteration.

Physical-evidence discipline: simultaneously preserved Hansen original-version PDF (denominator σ²) + Hansen-student correct-version PDF (denominator σ⁴) side-by-side under LCurve_HansenMistake_WeRectify_Test/ directory — “inconvenient-evidence retention” spirit, corresponding to .

Community contribution: posed 3 open questions to the academic community — (1) Should L-curve axes be controlled to the same units? (2) Is a senior collaborator’s elbow λ=1e-3 conclusion a scale artifact? (3) When applying L-curve, in which cases should one tune, and how?

Contribution (b): Pre-whitening L-curve Innovation + Methodological Restraint

Technical innovation: under RBF kernel (non-diagonal covariance $S_{a}$ ), transform into the whitened space: $x_{1} = S_{a}^{- 1/2} x, y_{1} = S_{e}^{- 1/2} y, A_{1} = S_{e}^{- 1/2} A S_{a}^{1/2}$

Problem diagnosis: discovered that L-curve x/y values become non-monotonic under non-diagonal $S_{a}$ → structural limit: L-curve corresponds to pure Tikhonov, and non-diagonal $S_{a}$ cannot derive into Tikhonov form; “modified L-curve” is overly sensitive to $S_{a}, S_{e}$ priors.

Independent judgment — methodological restraint (2022-01-23 group meeting verbatim):

(Declined to tune $S_{a}$ , $S_{e}$ priors to engineer a target λ — methodological commitment.)

Methodology-maturity evidence: “I systematically diagnose the limitations of a method I myself proposed” — an undergraduate refusing the advisor’s implicit preference to engineer the desired λ value, a research-integrity signature. Corresponds to Moment 3.

Contribution (a): Hardened Balancing Principle (HBP) 90-paragraph Writeup

Theoretical framework: starting from $ψ^{+} = 1$ (Tikhonov), via Bauer 2007 exponential concentration assumption → automatic parameter-selection HBP λ_{n+} formula.

Systematic stress test:

Hansen gravity problem (500-instance, condition number ~2.88e28 vs reported ~1.54e5)
Golub 1979 Laplace (condition number reaching 2.88e28)
3-point linear regression (Prof. King-Fai Li × Zhenyu Bayesian-form closed-solution cross-validation)
DSCOVR real PC2 time series

Conclusion: HBP performs the best, requiring no prior knowledge of noise level δ or tuning constant κ.

Output: the 90-paragraph math section writeup in internal-only manuscript — equivalent in scale to a standalone technical note/tutorial, containing the full derivation from Tikhonov to HBP.

Contribution (d): Direction of Noise Vector Influences Regularization (Original Discovery)

Original discovery (2021-09-20 verbatim):

“When I did numerical tests on L-Curve, GCV etc., I found out the direction of the noise vector plays a big role in the performance of statistical criteria. I did a literature search and it seems nobody has studied this topic.”

4-method failure-mode catalog:

Method	Failure condition
L-curve	\|x_sol\| large, or $dd$ near EV1
Kawahara Bayesian	\|x_sol\| large (5/5 fail for x_sol=[10,1])
GCV	$d$ near EV1 and $dd$ near EV2
Discrepancy principle	Always works, but introduces systematic error

Origin — accidental discovery: when reproducing Golub 1979 GCV (condition 2.88e28), the 4th of 5 instances had a completely different λ_GCV → Hansen-gravity 500-instance validation aggregated → systematized into contribution (d). Independent discovery + literature-gap awareness (with HBP later extended by a subsequent collaborator to PC1/PC4 clouds, forming a methodology-influence chain).

Cross-language fluency (within this single 11-month project):

Language	Task taken
IDL	OOP MCMC class (inheriting Yuk Yung lab convention)
MATLAB	Hansen-book gravity problem replication + csvd
Python	Kawahara CUDA library bootstrap (remote workstation, Anaconda) + emcee + healpy
Mathematica	Symbolic curvature derivations (calculation_4_cal_deriva.nb + calculation_5_cal_deriva.nb)
R	Metropolis-Hastings baseline for cross-validation

29 group_meeting pptx sustained trajectory (Mar 2021 → Mar 2022), together with Walker paper v3→v6 drafting (Sep-Oct 2021) + nwp_hw 4 assignments (Oct-Dec 2021) constitutes the “3-track parallel Sep-Dec 2021” evidence of Moment 4.