Research Note
Aerosol Joint Retrieval Senior Thesis @ PKU
Basic Info
- Period: Dec 2021 — Jun 2022 (~7 months, senior-spring senior-thesis period; ~5.5 months active research + 4 weeks of thesis writing + 2 weeks of post-defense polish)
- Role: Senior-thesis independent research (sole author, PI-supervised)
- Organization: Department of Atmospheric and Oceanic Sciences, School of Physics, Peking University (Jing Li group)
- Advisor: Jing Li (sole formal advisor, senior-thesis PI)
- Location: Beijing / NSCC-GZ Tianhe-II (login as (undergraduate Tianhe-II account))
- Senior-thesis artifact:
(senior thesis final docx).docx(+ .pdf, defended 2022-06-03; 1,350 words core content / 192 paragraphs / 7 tables / 4 refs / 21 images)
Core Problem
First implementation of an iteratively coupled MODIS + CALIPSO joint aerosol-retrieval algorithm — using the aerosol extinction-coefficient profile from CALIPSO active remote sensing to replace the fixed scale-height assumption in MODIS Dark Target C5 passive remote sensing; using the column-total AOD retrieved by MODIS as a total-mass constraint (boundary condition) on CALIPSO’s multiple candidate extinction profiles; iterating until both MODIS AOD and CALIPSO extinction profile converge, supplementing each other’s weakest assumptions.
Zhenyu’s Original Contributions
Contribution 1 — First iteratively-coupled MODIS + CALIPSO joint retrieval algorithm (§4.2 self-statement):
“本研究首次实现了 MODIS 和 CALIPSO 的气溶胶联合反演算法的开发,同时该联合反演算法是将 MODIS 和 CALIOP 算法相互迭代、相互耦合的,不是孤立地进行改善,而是同时对于 MODIS 的反演 AOD 值和 CALIOP 的消光系数廓线进行改进,因而两卫星在联合反演的全程可以互相匹配、改善、验证。”
(English gloss: This study, for the first time, develops a joint MODIS + CALIPSO aerosol-retrieval algorithm. The joint algorithm mutually iterates and mutually couples MODIS and CALIOP, rather than improving them in isolation: it simultaneously improves both MODIS-retrieved AOD and CALIOP extinction profile, so the two satellites can match, improve, and validate each other throughout the joint retrieval.)
→ Not an ad-hoc combination of single-satellite products, but a convergent iterative-coupling algorithm.
Contribution 2 — CALIPSO L2 non-physical bug catch + correction:
| Source | 532 nm extinction range (0-10 km) | Physical plausibility |
|---|---|---|
| Joint retrieval here (image14) | 0 – 0.2 km⁻¹, all positive | OK — physically plausible |
| CALIPSO L2 official product (image15) | -5000 to -10000 km⁻¹ abrupt negatives | NOT plausible — strongly non-physical |
→ Qualitative improvement — not just reducing small errors, but correcting non-physical artifacts in the official CALIPSO L2 product (from -10⁴ km⁻¹ negatives to a plausible 0-0.2 km⁻¹ positive range).
Contribution 3 — 4 Beijing-case validation (vs AERONET Beijing site 550 nm AOD ground truth):
| # | Date | AERONET reference AOD | Error change (vs MODIS-only baseline) |
|---|---|---|---|
| 1 | 2007-03-11 | 0.1430 | abs ↓ 0.0114, rel ↓ 7.97% |
| 2 | 2008-04-14 | 0.7051 | abs ↑ 0.4565, rel ↑ 64.74% (did not converge; reverse case preserved) |
| 3 | 2008-04-30 | 1.0797 | abs ↓ 0.0322, rel ↓ 2.98% |
| 4 | 2019-03-23 | 0.1004 | abs ↓ 0.0196, rel ↓ 19.52% |
→ 3/4 cases successfully improve over MODIS-only: absolute AOD error reduced by 0.011-0.032; relative error reduced by 3-20%; improvement more pronounced in higher-AOD regimes. Case 2 reverse result is fully preserved in the thesis (not airbrushed) — the reverse becomes algorithm-applicability-boundary diagnosis (see Contribution 4).
Contribution 4 — inconvenient evidence preserved as a separate docx:
A separate docx (supplementary inconvenient-evidence docx) (filename gloss: “Hard-to-Explain or inconvenient evidence Section.docx”) preserves the full table: the CALIPSO L2 column-total AOD is closer to AERONET than the joint-retrieval profile-integrated AOD in 4/4 cases — disadvantageous to the thesis’s main narrative.
| Experiment | L2 vs AERONET abs err | Joint vs AERONET abs err | Closer to reference |
|---|---|---|---|
| 1 (2007-03-11) | 0.0694 | 0.1942 | L2 closer |
| 2 (2008-04-14) | 0.0917 | 1.6959 | L2 closer |
| 3 (2008-04-30) | 0.3226 | 0.3044 | Joint slightly closer (tied) |
| 4 (2019-03-23) | 0.0519 | 0.1541 | L2 closer |
→ Zhenyu did not omit this comparison dimension from the thesis but instead fully preserved the table in a separate docx, accepting the risk of being pointed out at defense. See Evidence Preservation Discipline (private companion) Instance 4.
Contribution 5 — Failed-approach physical preservation (Instance 5 of the disadvantageous discipline):
Within CALIPSO_Retrieval_Changliang/’s 25 versions, 2 failed approaches are fully preserved + explicitly annotated:
try1_useH/(2.5 GB, early “useH” approach abandoned)_之前错误的Cdistance版本/(920 MB, C_distance.mlx bug version, corresponding to the 2022-03-21 weekly PPT’s “spherical distance function bug + first-half-week work redo” pre-fix version) (folder gloss: “previous_incorrect_Cdistance_version/”)- Top-level txt annotation:
除了try_1_useH和_之前错误的Cdistance版本_其他文件夹中都是正确的函数.txt(filename gloss: “Except for try_1_useH and previous_incorrect_Cdistance_version, all other folders contain correct functions.txt”)
→ Two-level redundant annotation (folder name itself includes “previous incorrect” + top-level txt explicitly lists wrong folders). Failed-approach physical evidence preserved + explicitly annotated; not deleted.
Key Achievements
-
First iteratively-coupled MODIS + CALIPSO joint retrieval algorithm (novel-algorithm-level senior thesis)
-
CALIPSO L2 -10⁴ km⁻¹ non-physical bug catch + fix (0-10 km altitude negative spike → corrected to 0-0.2 km⁻¹ plausible positive range)
-
4-Beijing-case AOD improvement (abs ↓ 0.011-0.032, rel ↓ 3-20% vs MODIS-only baseline, AERONET ground truth)
-
inconvenient evidence preserved in a separate docx ((supplementary inconvenient-evidence docx), 4/4 cases L2 column-total closer to AERONET)
-
9-version thesis auto-diff (316 → 1,350 words in 27 days / 9 versions / 2 rounds of Jing Li edit pass / 9 substantive annotations); 316 → 1,137 words in 3 days (+821 words / +270 words/day = fastest content fill)
-
22 weekly group-meeting PPTs sustained over 6 months (Dec 2021 - May 2022) at ~1 PPT/week cadence; including the Apr 11 dual-version (forme self-track + forteacher present) discipline
-
4-person lab scaffolding under Jing Li PI:
- a senior lab collaborator (older labmate) — Fernald 1984 math pedagogy (
Formula_derivation.docx24 paragraphs) + lidar-ratio LUT (a 2020 AGU Fall Meeting paper, thesis ref [10]) + CALIPSO base code - Li Chong (older labmate) — MODIS retrieval MATLAB code + 1 km scale-height error → 30% AOD-error quantification + critical Caltech bridge (Li, C., Li, J., Dubovik, O., Zeng, Z. C., Yung, Y. L. 2020 Remote Sensing 12(9), 1524 paper first author; co-author with Zhenyu’s contemporaneous Caltech DSCOVR PI Yuk L. Yung)
- Dong Yueming — LAADS DAAC + wget MODIS L1B download 13-slide PPT (2020-09-04 onboarding tutorial)
- a senior lab collaborator (older labmate) — Fernald 1984 math pedagogy (
-
Mid-term commitment → final delivery full closure: at the 2021-12-14 mid-term answer Zhenyu pre-committed 4 features (correlation-coefficient improvement + RMSE reduction + fitted slope → 1 + smoother profile with no negatives), and by May 2022 the final delivery confirmed all 4 + added 1 new (CALIPSO L2 -10⁴ non-physical correction)
5.5-month Trajectory Key Milestones
| Date | Event | Status |
|---|---|---|
| 2021-12-10 | First formal progress PPT | MODIS code returns NaN + proactively asked older labmates (honest assessment) |
| 2021-12-14 | Mid-term defense PPT (12 slides) + script | ”Replace MODIS LUT entirely” ambitious framework + 4 features pre-committed |
| 2021-12-17 | Discussion PPT with Li Chong (7 slides) | Line-level + path-level MODIS NaN demonstration to older labmate |
| 2022-01-17 | Weekly report | MODIS NaN 1-month debug closure (root cause: dark-pixel algorithm + cloud/snow bright surface auto-NaN = expected behavior; diagnostic: cross-validate via MODIS L2 Mean_Reflectance_Land) |
| 2022-02-21 | Weekly report | CALIPSO single-stack closure — “Retrieval code now reads correctly; bug under testing resolved. Sigma_532, Sigma_1064, lidar-ratio profiles all obtainable” |
| 2022-03-14 | Weekly report (374 KB) | First instance of joint algorithm (2019-08-01 Beijing case) + 2 key questions: (Q1) CALIPSO vs MODIS magnitude difference, (Q2) AOD sensitivity to scale height = “false iterative equilibrium” seed |
| 2022-03-21 | Weekly report | C_distance.mlx bug fix — “spherical-distance function bug, first-half-week work redone” (corresponds to the P3 _之前错误的Cdistance版本/ preserved folder) |
| 2022-03-28 | Weekly report (17 slides) | AERONET 9-combination methodology + self-rejecting 2_2 (early-late 13-hour gap) |
| 2022-04-04 | Weekly report (15 slides) | Log-linear full-profile fitting vs 1/e method multi-method comparison added (see Multi-Method Comparison Spine (2019-2025)); 2019-08-01 case two methods give 0.66 km vs 4.1 km = 6× difference |
| 2022-04-11 | Dual-version: forme (35 slides) + forteacher (34 slides) | 8-case (2007 Jan-Apr) systematic sweep: 5 rejected + 1 non-converged + 2 successful, each rejection has a documented reason (CALIPSO NaN / single profile / too far from MODIS / AERONET missing) |
| 2022-05-02 | Weekly report (28 slides) | AERONET validation fully integrated + 4 final cases locked (3_6 / 2_1 / 4_8 / 4_9) = thesis Section 3.4 source-of-truth |
| 2022-05-07 | Thesis writing kickoff | 0507 template version (316 words) |
| 2022-06-03 | Defense + final draft | 1,350 words / 192 paragraphs / 7 tables / 4 refs / 21 images |
Technical Depth (resume-grade specifics)
- 5 languages integrated: MATLAB (CALIPSO + MODIS retrieval main stack) + Python (MODIS data + plotting, MPL_Extinction ref) + Bash (batch processing) + R + NCL
- ~6,400 MATLAB lines across hzy_experi folders (aggregate)**: 33 MODIS hzy_experi folders (each ~1,057-1,606 lines, 4-6 .m files) + 24 CALIPSO hzy_experi* folders (each ~14 mlx files) = mostly copy-paste per-experiment, ~1,500-2,000 truly unique core lines + hardcoded per-folder path/data adjustments
- 48 GB Modis_scripts + 5.7 GB CALIPSO_Retrieval_Changliang (aggregate storage) — including raw HDF data + per-experiment cached .mat; the
贺震昱_code_整理/final clean reproducible version is only 56 MB (source + 18 LUT + 2 instruction txt) - 22 weekly PPTs / Dec 2021 - May 2022 / ~1 PPT/week sustained cadence (PPT size evolution reflects work intensity: Jan-Feb 30-70 KB code debug → Mar-Apr 400-900 KB experiment plots → Apr 11 dual-version peak)
- 9-stage experimental-design selection cascade: CALIPSO data quality / 50 km distance / 35 min time gap / MODIS-CALIPSO AOD-range overlap / 30 km MODIS pre-retrieval averaging / 28 km along-track CALIPSO (~163 profiles, ~8 s flight) averaging / AERONET availability in same time window / absorption-dominant regime (Jan-Apr) / log-linear full-profile fitting (instead of 2-point 1/e method) — multi-year satellite dataset distilled down to 4 rigorously validated successful experiments
- 18 LUT .mat files (3 bands × 6 aerosol types = MODIS Dark Target C5 forward-RT pre-computed tables, prior production by Li Chong / Chang Liang) + 12 chinaha_X.mat (months 1-12 China monthly scale-height LUT) + Table.mat (Chang Liang Angstrom ↔ lidar-ratio LUT)
- NSCC-GZ Tianhe-II HPC (login as (undergraduate Tianhe-II account), Jing Li lab account) — same physical infrastructure as the Walker project’s (undergraduate Tianhe-II account) (Ji Nie lab account) but a different lab account (cross-project HPC continuity)
- 2 PaperPass flagship trials + 1 official CNKI plagiarism check (post-knowledge-graph plagiarism-check discipline) + 2 PKU official forms (advisor review form + audit form, both created on 5-3, 1-month buffer ahead of the 6-3 defense) = senior-thesis administrative discipline at PKU
- 2 GB Beijing + China OpenStreetMap shapefiles (over-prepared for
BOUNT_line.shpBeijing-boundary subset use only; future-work data preservation) + Li Wanbiao Atmospheric Physics textbook with Zhenyu’s annotations, scanned 24 MB (undergraduate-classroom textbook carry-over to senior-thesis-stage reference)
Academic / Career Significance
- Component of the 3-track parallel Sep-Dec 2021 senior portfolio: Aerosol thesis drafting + Walker Circulation Dynamics @ PKU paper v3→v6 + DSCOVR Inverse Problem + Regularization Methods @ Caltech / UCR MCMC retrieval +
nwp_hw4 assignments ≈ 3-4 deliverable/week sustained (different cadence pattern from the Caltech DSCOVR Feb-2022 paper-draft termination → Aerosol-only full-focus phase, but the same sustained discipline) - Caltech institutional bridge via Li Chong: Li Chong’s 2020 Remote Sensing paper co-author list includes Yuk L. Yung — one of the institutional pathways for Zhenyu’s Mar 2021 Caltech DSCOVR onboarding. PKU Jing Li group → Li Chong paper → Caltech Yuk Yung group is the co-author chain for Zhenyu’s Caltech onboarding in Mar 2021.
- Senior thesis 2022-06-03 defense → 2022-08 UC Berkeley enrollment timeline: thesis defended Jun 3, started PhD at the Berkeley Romps lab in September (see )
- Aerosol research → Inez Fung methane mapping (Dec 2023 — Jun 2024) preparation training: satellite-remote-sensing discipline carries over (CALIPSO L1/L2 processing → Sentinel-5P TROPOMI CH₄ processing; iteratively coupled algorithm methodology → emission-inventory matching)
- Acknowledgments cross-project mentorship-sequence confirmation: the final-draft acknowledgments (paragraphs 178-186) list the entire mentorship network — Prof. Jing Li (4th, Aerosol PI) [detailed mentorship] (gloss: very conscientious and responsible — guided every weekly work report and patiently, in detail, answered my questions) → Prof. Ji Nie + Prof. Yongyun Hu (5th, Walker) “本科生科研期间指导我的聂绩老师、胡永云老师” (gloss: Prof. Ji Nie and Prof. Yongyun Hu who advised me during the undergraduate-research period) — explicitly confirms: Zhenyu’s undergraduate-research = the Walker project (Nov 2020 - Oct 2021), and the thesis = Aerosol (Dec 2021 - Jun 2022) are sequential rather than parallel
Skills Used
- Satellite Remote Sensing & Data Processing (MODIS L1B Dark Target C5 + CALIPSO L1 Fernald method + AERONET L2 interpolation; NASA LAADS / LARC / AERONET 3 data centers; comparison of 5 interpolation methodologies)
- Climate Physics & Atmospheric Science (Fernald 1984 lidar equation + Angstrom exponent and lidar-ratio physical constraint + Kaufman & Tanré 1998 Dark Target + Twomey/Albrecht indirect radiative forcing; absorbing-aerosol scale-height sensitivity physical motivation)
- Python Scientific Computing (MODIS data download + AERONET interpolation plotting + MPL ref code)
- MATLAB — no separate skill page yet, but ~6,400 lines across hzy_experi folders (aggregate)* + CALIPSO + MODIS dual-stack as the main implementation language is capability evidence; can be added as a MATLAB sub-section under the Python Scientific Computing skill page
Related Pages
- Undergraduate Research @ Peking University & Caltech/UCR — undergraduate-research umbrella (Aerosol section condensed to a cross-link to this page)
- Jing Li — sole formal advisor, source of reference-methodology signature
- a senior lab collaborator — older labmate, Fernald 1984 math pedagogy + CALIPSO base code
- Li Chong — older labmate, MODIS code + 1 km scale-height quantification + critical Caltech bridge via Yuk Yung paper attribution
- Dong Yueming — older labmate, MODIS download tutorial (2020-09-04)
- Yuk L. Yung — Caltech PI, connected with Jing Li via Li Chong’s paper paper attribution (bridge context for DSCOVR Inverse Problem + Regularization Methods @ Caltech / UCR)
- PKU advisor methodology companion (private) — Jing Li reference-methodology signature (mirror with Ji Nie Walker project (private context))
- Evidence Preservation Discipline (private companion) — Instance 4 (
难以解释.docx) + Instance 5 (try1_useH/+_之前错误的Cdistance版本/) + Instance 6 (8-case rejection cascade) - Multi-Method Comparison Spine (2019-2025) — log-linear full-profile fitting vs 1/e 2-point multi-method; 5 numerical methods (linear extrapolation + pchip extrapolation + log fit + 1/e method + multi-AOD_c averaging) actively in parallel
- Independent Judgment narrative (held until 2026-06-10) — Moment 5 artifact (Aerosol disadvantageous-evidence docx)
- — thesis source page (final docx + .pdf)
- Academic Honors @ PKU (2018-2022) — senior-thesis archive
- Walker Circulation Dynamics @ PKU — parallel Walker experience page (sequential predecessor; acknowledgments explicitly confirm sequence)
- DSCOVR Inverse Problem + Regularization Methods @ Caltech / UCR — contemporaneous Caltech project (2021.03 - 2022.03 MCMC retrieval; bridge via Li Chong / Yuk Yung paper attribution)
- — Aerosol satellite remote sensing → methane mapping successor PhD project
Sources
research_wiki/projects/aerosol_retrieval/overview.md(~46k words complete, full P1-P5 deep-dive)research_wiki/resume_angles/problem_solving.md§Project 4 Aerosol (if mentioned)research_wiki/resume_angles/cross_disciplinary.md§Section 9 Jing Li methodology (private)- raw artifact:
raw/pku项目文件/pku_research/Lijing/毕业论文写作/(senior thesis final docx).docx(+ .pdf) - raw artifact:
raw/pku项目文件/pku_research/Lijing/毕业论文写作/supplementary inconvenient-evidence section.docx(disadvantageous-evidence independent docx) - raw artifact:
raw/pku项目文件/pku_research/Lijing/code/(48 GB Modis_scripts + 5.7 GB CALIPSO_Retrieval_Changliang + 56 MB 贺震昱_code_整理 final clean) - raw artifact:
raw/pku项目文件/pku_research/Lijing/毕业论文写作/ppt_pre合集/(22 weekly PPTs, Dec 2021 - May 2022)