Skip to main content
Skip to article

Research Note

Python Scientific Computing

Zhenyu He · Jobs Stroustrup 2 min read

Proficiency

Expert

Description

Python as primary tool for scientific computing and data analysis:

  • Processing and visualization of numerical simulation data (netCDF, HDF5)
  • Data pipeline construction (QA filtering, spatial remapping, temporal aggregation)
  • Spatial data acceleration (KD-tree indexing, gridded aggregation)
  • Statistical analysis and sensitivity experiment design
  • Implementation of inversion methods (MCMC, etc.)

Used In

Research-grade signature capabilities

Aggregate scale signature

  • ~175k+ lines total aggregated across 7 years of research (PKU Walker 27,882 MATLAB+NCL lines + PKU Aerosol ~6,400 MATLAB lines + Caltech DSCOVR ~3,000+ lines + PhD 115,363 lines / 62 ipynb / 3,793 code cells — Python primary + cross-language auxiliary)
  • 62 ipynb notebooks in PhD research notebook (UC Berkeley Romps Lab aggregate)

Multi-language fluency (context-switch discipline)

  • 5-language fluency within the 11-month Caltech DSCOVR project: IDL (Yuk Yung lab OOP MCMC class) + MATLAB (Hansen csvd) + Python (Kawahara CUDA + emcee + healpy) + Mathematica (symbolic derivations) + R (Metropolis-Hastings)
  • Fortran 77/90 in PhD stage (NCAR CAM3 + DAM LES source mods, see Numerical Model Source Code Modification)

Filename-as-version-control discipline

  • Jupyter notebooks use filename to encode methodology version (e.g., enforceEpsilon_z_iteration_v3.ipynb — filename encoding methodology iteration)
  • Aerosol code directory: hzy_experi1_package_collaborator_revised_now_no_error_above_height_4 (filename gloss: “hzy_experi1_package_handed-to-collaborator_revised_now-no-error-when-elevation-over-4”; filename encodes collaboration history)
  • See Inconvenient Evidence Retention Discipline (private)

Research-grade Python libraries

  • emcee (Foreman-Mackey 2013) MCMC
  • healpy HEALPix spherical grid
  • scipy.linalg Cholesky (assume_a="pos") + slogdet + SVD
  • KDTree spatial indexing (TROPOMI large-scale coordinate match)
  • pandas / xarray / netCDF4 / h5py
  • matplotlib / cartopy geographic plotting