Astrostatistics & Bayesian Inference Projects
< Back to Projects

Astrostatistics & Bayesian Inference Projects

Duration: 5 monthsStarted: Jan 2023Completed: Jun 2023Category: astrostatistics

Project Narrative

Integrated Statistical Inference in Observational Astrophysics From Cosmological Expansion to Dark Matter Model Discrimination

Across five computationally intensive astrophysics investigations, I developed and validated a unified statistical inference framework spanning cosmology, stellar atmospheres, exoplanet detection, and galactic dynamics. The work integrates weighted regression, full Bayesian inference via custom Markov Chain Monte Carlo (MCMC) samplers, cross-validation protocols, and information-theoretic model comparison (AIC/BIC) to rigorously test competing physical hypotheses against noisy observational data.

Rather than applying statistical tools as black-box routines, I derived likelihood functions directly from physical models including Planck radiation laws, transit geometries, Keplerian radial velocity equations, and modified gravitational dynamics. I explicitly validated convergence, posterior structure, and residual behavior, with emphasis on diagnosing model degeneracies, parameter non-convergence, and regimes where data precision is insufficient for robust model discrimination. Synthetic data generation and controlled error scaling were used to estimate the minimum information threshold required for 5-sigma separation in dark matter versus MOND analyses.

Cosmological Parameter Estimation

Weighted regression plus bootstrap validation provides a reproducible estimate of H0H_0 and propagated cosmological timescales.

Using Type Ia supernova distance modulus measurements, I re-estimated the Hubble constant via weighted least squares and independently verified the solution through bootstrap resampling. The governing model assumes linear expansion:

v=H0dv = H_0 d

The weighted normal-equation solution was implemented to verify np.polyfit outputs analytically. Measurement uncertainties were incorporated explicitly through inverse-variance weighting, and 1000 bootstrap realizations were generated by sampling perturbations from Gaussian error distributions.

The resulting posterior distribution of H01H_0^{-1} enabled propagation to the Hubble time:

tH=1H0t_H = \frac{1}{H_0}

This yielded an age estimate with statistically consistent uncertainty bounds. Correlation structure between slope and intercept was analyzed via corner plots, ensuring parameter covariance was quantified rather than assumed negligible.

Edwin Hubble (1929) plot of distant galaxies and redshifted recessional velocities.

Type 1A Supernova sampling offers a more precise approach to measure the distance using these events as “standard candles” for the distance modulus to recalculate the Hubble Constant, H0.H_{0}.

Bayesian Thermal Modeling of Brown Dwarfs

A custom MCMC implementation of Planck-law fitting exposed posterior sensitivity and validated robust convergence across priors.

To estimate the effective temperature of cool brown dwarfs, I implemented a custom MCMC sampler to fit the Planck distribution to log-linear spectral flux data. The forward model was derived directly from radiative transfer:

Bλ(T)=2hc2λ5(ehcλkT1)1B_\lambda(T) = \frac{2hc^2}{\lambda^5} \left( e^{\frac{hc}{\lambda k T}} - 1 \right)^{-1}

A Gaussian-error log-likelihood was used:

lnL=12i(FiFmodel,i)2σi2\ln \mathcal{L} = -\frac{1}{2} \sum_i \frac{(F_i - F_{\text{model},i})^2}{\sigma_i^2}

I evaluated uniform, log-uniform, and T4T^4-scaled priors to assess posterior sensitivity. Convergence was demonstrated with burn-in diagnostics, walker trace plots, and repeated-initialization robustness checks. Posterior correlation between temperature and scaling parameters was explicitly quantified. Five independent chains converging to statistically consistent medians validated both algorithmic correctness and posterior stability.

The raw data points come in log-linear space, with the flux in logarithmic scale and the wavelength in a linear scale. Plotting a black-body distribution on this interval and fitting it manually gives us this.

The model equation that will calculate the log of the flux parameter with TeffT_{\text{eff}} embedded in the bb variable.

Transit Signal Extraction and Noise Modeling

Frequency-guided initialization combined with global Bayesian detrending improves transit depth reliability under correlated systematics.

For exoplanet transit analysis, I built a preprocessing pipeline combining robust outlier rejection (5-point moving median with 3-sigma clipping), periodogram-based frequency initialization, and MCMC detrending. The global flux model included linear drift and thermal oscillations:

F(t)=at+b+A1sin(ω1t)+A2sin(ω2t)F(t) = a t + b + A_1 \sin(\omega_1 t) + A_2 \sin(\omega_2 t)

before fitting trapezoidal transit depth. Detrending was performed globally rather than transit-isolated to avoid biasing likelihood structure.

Model adequacy was evaluated using reduced chi-square and RMS-versus-binsize scaling, verifying white-noise behavior through:

RMSN1/2\text{RMS} \propto N^{-1/2}

Bootstrapped uncertainties were propagated through transit depth estimates, and residual histograms confirmed near-Gaussian error behavior.

The following model was used in the MCMC algorithm to detrend the data, using a linear term and an offset to detrend the drift in flux and two sinusoidal terms to detrend the oscillation in flux from heating and cooling of sensors.

The transit depth of 0.000117 ± 0.000004 with respect to the normalized flux with an uncertainty derived from a bootstrapping method using a normal distribution of errors, which was calculated by minimizing the error in a linear regression through each transit and non-transit regions.

Exoplanet Detection Significance Reassessment

Information criteria and cross-validation showed that added model complexity did not consistently generalize for Gliese 581 g.

I re-evaluated the claimed 4.7-sigma detection of Gliese 581 g using multi-planet radial velocity models and independent validation techniques. Competing models of varying planetary complexity were fit via chi-square minimization and evaluated with AIC, BIC, and 24-fold cross-validation:

AIC=2k2lnL,BIC=klnn2lnL\text{AIC} = 2k - 2\ln \mathcal{L}, \quad \text{BIC} = k \ln n - 2\ln \mathcal{L}

MCMC validation of best-fit parameters revealed non-convergent phase behavior, signaling parameter degeneracy. Cross-validation indicated that higher-complexity models did not consistently outperform simpler alternatives across folds, reducing support for the originally claimed detection significance.

Once the four models have been computed, we calculate several model comparison metrics X2\Chi^2 minimization, AIC, and BIC to obtain a first estimation of the best-fitting model.

K-folding training allows for the partitioning of data into k equally sized parts and using each fold as a validation set, while the remaining k-1 folds are used to train the model. The 24-fold technique is the most exhaustive and homogeneous.

Dark Matter vs. MOND Model Discrimination

Synthetic error-scaling experiments identified explicit data-quality thresholds required for statistically decisive dark-matter versus MOND separation.

In a comparative study of cold dark matter halo models versus MOND disk dynamics, I fit galaxy rotation curves using chi-square minimization and MCMC validation of parameters including critical acceleration acrita_{\text{crit}}. Model selection used AIC/BIC divergence analysis across galaxies.

To probe discriminative limits, I generated synthetic datasets with progressively reduced uncertainties and evaluated the minimum sample size required for 5-sigma separation between theories. The MOND acceleration law tested was:

μ ⁣(aacrit)a=aN\mu\!\left(\frac{a}{a_{\text{crit}}}\right) a = a_N

Non-convergence of acrita_{\text{crit}} in several systems exposed parameter insensitivity under realistic error regimes. By quantifying precision requirements (approximately 12 high-quality points for decisive separation in NGC 3198), this work moved beyond model fitting toward information-limit analysis.

Direct comparison of competing gravitational paradigms under observed uncertainties. The disk, halo, disk+halo model and the MOND disk model, and evaluated the performance of these models in describing the observed data and to determine the most suitable model for each individual galaxy.

An MCMC algorithm was also used to independently validate the best fitting parameters for the MOND models. Pictured above is one sample of the posteriors for each of the parameters Rdisk, Mdisk, and acrit for the NGC3198 galaxy rotation curve.

Related Writings

Final Project: Dark Matter vs MOND Rotation-Curve Model Discrimination

Lab 4: Radial Velocity Model Selection for Gliese 581

Lab 3: Exoplanet Transit Detrending and Depth Recovery

Lab 2: Brown Dwarf Temperature Inference with MCMC

Lab 1: Hubble Constant Estimation and Hubble Time Inference

Technologies & Tools

PythonemceeNumPySciPyPandasBayesian InferenceStatistical Diagnostics

Gallery

Jorge Casas

Computational astrophysicist & mechanical engineer exploring the cosmos and designing solutions.

© 2026 Jorge Casas. Built with Next.js, deployed via GitHub Pages.