Generative Models for Physics Discovery

Generative AI isn’t just for images and text. In physics it generates new particle collision events for Monte Carlo studies, samples from Bayesian posteriors, designs optimal experiments, and — most excitingly — recovers analytic physical laws from data. This cluster covers VAEs, normalizing flows, diffusion models, and symbolic regression, with full working code and real physics applications.

🌀 VAEs & Normalizing Flows ✨ Diffusion Models 🧮 Symbolic Regression 🔭 Law Discovery

AI for Physics Students › Cluster 6 › Cluster 7: Generative Models & Discovery

Cluster 6 asked how ML solves quantum physics problems — forward problems (finding ground states) and inverse problems (learning parameters from data). Cluster 7 tackles the deepest inverse problem of all: can a machine learn the laws of physics themselves? We also cover how generative models have transformed simulation-heavy fields like particle physics and cosmology.

📋 What You Will Learn

Why Physics Needs Generative Models
Variational Autoencoders (VAEs)
Normalizing Flows for Physics
Diffusion Models for Scientific Data
Symbolic Regression with PySR
AI Feynman: Discovering Physical Laws
Simulation-Based Inference (SBI)
Bayesian Optimal Experiment Design

Section 1 — Why Physics Needs Generative Models

Physics has always been about distributions. Not single measurements, but the statistical behaviour of many measurements, many particles, many configurations. Every experiment generates data from some underlying probability distribution determined by the laws of physics. The question generative models ask is: can we learn that distribution so precisely that we can sample from it at will?

The answer has profound consequences. If you can generate realistic particle collision events, you can augment your Monte Carlo simulation datasets at near-zero cost. If you can generate valid crystal structures, you can explore the materials space at scale. If you can generate gravitational wave signals, you can train detectors on rare waveforms. And if you can discover the analytic distribution underlying your data — the mathematical formula — you have discovered a physical law.

In this cluster we cover the four families of generative models that have found the deepest roots in physics, and for each one, we go beyond toy examples to show exactly where and why physicists use them.

VAEs

Continuous latent space; interpretable interpolation; fast sampling. Best for: molecular generation, anomaly detection, latent physics.

Normalizing Flows

Exact likelihood; invertible; exact density estimation. Best for: HEP event generation, posterior sampling, density ratio estimation.

Diffusion Models

Highest sample quality; slow sampling; score-based. Best for: molecular structures, climate fields, medical imaging, material synthesis.

Symbolic Regression

Recovers analytic formulas; interpretable; not strictly generative but discovers underlying structure. Best for: law discovery, equation learning.

Section 2 — Variational Autoencoders: Physics in Latent Space

The Variational Autoencoder (VAE), introduced by Kingma & Welling (2014), is built on a beautifully simple idea: learn a low-dimensional latent space that captures the essential structure of your data, and use it to generate new samples. For a physicist, this is remarkably close to identifying the order parameters of a system — the few variables that capture the macroscopic behaviour.

The VAE has two components: an encoder qφ(z|x) that maps data x to a distribution over latent variables z, and a decoder pθ(x|z) that maps z back to data space. Training maximises the ELBO (Evidence Lower BOund) — a lower bound on the log-likelihood:

VAE ELBO: log p(x) >= E[log p(x|z)] - KL(q(z|x) || p(z))

The KL term acts as a regulariser, forcing the learned latent space to be approximately Gaussian. This gives it a crucial property: you can sample from p(z) = N(0,I) and decode to get a valid new data point. For physics, this means: sample a point in latent space, decode it, and get a new particle event, crystal structure, or molecular conformation.

VAE for Particle Collision Event Generation

Python — Physics VAE: learn latent space of collision events, generate new ones

# VAE for generating particle physics events
# Input: high-level kinematic features of collision events
# Goal: learn to generate new events matching the underlying physics
import torch, torch.nn as nn
import torch.nn.functional as F
import numpy as np

class PhysicsVAE(nn.Module):
    def __init__(self, input_dim=20, hidden=256, latent=8):
        super().__init__()
        # Encoder: physics event → latent distribution
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, hidden), nn.LeakyReLU(0.2),
            nn.BatchNorm1d(hidden),
            nn.Linear(hidden, hidden),   nn.LeakyReLU(0.2),
            nn.BatchNorm1d(hidden),
        )
        self.mu_layer     = nn.Linear(hidden, latent)
        self.logvar_layer = nn.Linear(hidden, latent)
        # Decoder: latent z → reconstructed physics event
        self.decoder = nn.Sequential(
            nn.Linear(latent, hidden),   nn.LeakyReLU(0.2),
            nn.BatchNorm1d(hidden),
            nn.Linear(hidden, hidden),   nn.LeakyReLU(0.2),
            nn.BatchNorm1d(hidden),
            nn.Linear(hidden, input_dim)
        )

    def encode(self, x):
        h   = self.encoder(x)
        return self.mu_layer(h), self.logvar_layer(h)

    def reparameterise(self, mu, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std   # z ~ N(mu, sigma^2)

    def decode(self, z): return self.decoder(z)

    def forward(self, x):
        mu, logvar = self.encode(x)
        z          = self.reparameterise(mu, logvar)
        return self.decode(z), mu, logvar

# ── VAE loss: reconstruction + KL divergence ───────────────────
def vae_loss(x_recon, x, mu, logvar, beta=1.0):
    # Reconstruction: MSE for continuous physics features
    recon_loss = F.mse_loss(x_recon, x, reduction='sum') / x.shape[0]
    # KL divergence: analytical formula for Gaussian q
    kl_loss    = -0.5 * torch.mean(1 + logvar - mu.pow(2) - logvar.exp())
    # beta > 1: beta-VAE — stronger disentanglement of latent variables
    # In physics: beta-VAE latent dims often correspond to physical params
    return recon_loss + beta * kl_loss

# ── Training ───────────────────────────────────────────────────
model = PhysicsVAE(input_dim=20, hidden=256, latent=8)
optim = torch.optim.Adam(model.parameters(), lr=3e-4)

for epoch in range(200):
    for x_batch in train_loader:
        optim.zero_grad()
        x_recon, mu, logvar = model(x_batch)
        loss = vae_loss(x_recon, x_batch, mu, logvar, beta=4.0)
        loss.backward(); optim.step()

# ── Generate new events by sampling from prior ─────────────────
with torch.no_grad():
    z_samples     = torch.randn(10000, 8)       # sample from N(0,I)
    new_events    = model.decode(z_samples)     # decode to physics space

# Validate: compare generated vs real distributions
# Use: wasserstein distance, MMD, or per-feature KS tests
# Reference: Cerri et al. (2019) — VAE for HEP anomaly detection, JHEP

🧠 Concept: The β-VAE and Physical Interpretability

A standard VAE (beta=1) learns a latent space that is good for reconstruction but may entangle multiple physical variables in a single latent dimension. beta-VAEs (Higgins et al. 2017) increase the KL weight, forcing a more disentangled representation. In physics applications, this is remarkably meaningful: for a VAE trained on galaxy images, the disentangled latent dimensions often correspond to redshift, ellipticity, size, and inclination — exactly the physical parameters an astronomer would have defined by hand. The network discovers the natural coordinates of the data.

Section 3 — Normalizing Flows: Exact Likelihood Generative Models

Normalizing flows learn an invertible, differentiable transformation fθ from a simple base distribution (Gaussian) to the complex data distribution. They offer something VAEs cannot: exact likelihood computation. This is critical in physics, where you often need to compare the probability of two hypotheses or compute likelihood ratios for hypothesis testing.

Normalizing flow loss: NLL = -sum log p_Z(f(x)) + log|det J_f(x)|

The log-determinant term is the Jacobian of the transformation — it accounts for how the flow stretches or compresses volume in the transformation. Computing this efficiently is the core technical challenge of normalizing flows. Modern architectures like RealNVP, Glow, and Neural Spline Flows make it tractable via coupling layers that have triangular Jacobians with O(d) determinant computation.

RealNVP Coupling Flow for HEP Event Generation

Python — RealNVP normalizing flow: exact likelihood, invertible event generation

# pip install nflows  (or use zuko, glasflow, normflows)
# We implement a minimal RealNVP from scratch for pedagogical clarity
import torch, torch.nn as nn
import numpy as np

# ── Affine coupling layer: the building block of RealNVP ────────
# Split x into [x1, x2]. Transform x2 using a net conditioned on x1.
# Jacobian is triangular -> determinant is product of diagonal elements
class AffineCoupling(nn.Module):
    def __init__(self, dim, mask):
        super().__init__()
        self.register_buffer('mask', mask)
        d_masked = mask.sum().item()
        # Scale and shift networks — conditioned on masked input
        self.s_net = nn.Sequential(
            nn.Linear(int(d_masked), 128), nn.Tanh(),
            nn.Linear(128, 128),            nn.Tanh(),
            nn.Linear(128, dim - int(d_masked))
        )
        self.t_net = nn.Sequential(
            nn.Linear(int(d_masked), 128), nn.Tanh(),
            nn.Linear(128, 128),            nn.Tanh(),
            nn.Linear(128, dim - int(d_masked))
        )

    def forward(self, x):
        x1 = x[:, self.mask.bool()]           # masked (unchanged) part
        x2 = x[:, ~self.mask.bool()]          # transformed part
        s  = self.s_net(x1)                    # log-scale
        t  = self.t_net(x1)                    # shift
        y2 = x2 * torch.exp(s) + t   # affine transformation
        y  = x.clone()
        y[:, ~self.mask.bool()] = y2
        log_det = s.sum(dim=1)         # log |det J| = sum of log-scales
        return y, log_det

    def inverse(self, y):
        y1 = y[:, self.mask.bool()]
        y2 = y[:, ~self.mask.bool()]
        s  = self.s_net(y1); t = self.t_net(y1)
        x2 = (y2 - t) * torch.exp(-s)
        x  = y.clone; x[:, ~self.mask.bool()] = x2
        return x

# ── Full normalizing flow: stack of alternating coupling layers ─
class RealNVP(nn.Module):
    def __init__(self, dim=8, n_layers=6):
        super().__init__()
        # Alternate which half gets transformed
        masks = [torch.arange(dim) % 2 == i % 2 for i in range(n_layers)]
        self.layers = nn.ModuleList([AffineCoupling(dim, m) for m in masks])

    def log_prob(self, x):
        log_det_total = torch.zeros(x.shape[0])
        z = x
        for layer in self.layers:
            z, log_det = layer(z)
            log_det_total = log_det_total + log_det
        # Base distribution: standard Gaussian
        log_pz = -0.5 * (z**2 + np.log(2*np.pi)).sum(dim=1)
        return log_pz + log_det_total    # exact log-likelihood!

    def sample(self, n):
        z = torch.randn(n, self.layers[0].s_net[0].in_features + 4)
        for layer in reversed(self.layers):
            z = layer.inverse(z)
        return z

# ── Training: minimise negative log-likelihood ──────────────────
flow  = RealNVP(dim=8, n_layers=6)
optim = torch.optim.Adam(flow.parameters(), lr=3e-4)

for epoch in range(300):
    for x_batch in train_loader:
        optim.zero_grad()
        loss = -flow.log_prob(x_batch).mean()   # NLL
        loss.backward(); optim.step()

# Exact density: flow.log_prob(x) gives exact log p(x) for any event
# This is critical for likelihood ratios in hypothesis tests
# Reference: Gretsi et al. (2021) — Flows for HEP, JHEP
# Libraries: nflows, zuko (modern, clean API), normflows

In HEP, the standard way to compare two hypotheses (signal vs background) is the likelihood ratio test: L(H1)/L(H0). A normalizing flow gives you exact log p(x) for any event x — something a GAN cannot. This means flows can be used directly for hypothesis testing, detector unfolding, and optimal observable construction. Flows have largely replaced GANs in HEP applications for this reason. 💡 Why Exact Likelihood Matters in Physics#fffbeb

Section 4 — Diffusion Models for Scientific Data

Diffusion models — the technology behind Stable Diffusion and DALL-E — have made a striking entry into physics. The core idea is elegantly physical: gradually add Gaussian noise to data until it becomes pure noise (the forward process), then train a neural network to reverse this process (the reverse process), denoising step by step.

Diffusion: x0 → xT → x0-hat. Loss = E[||eps - eps_theta(x_t, t)||^2]

The forward process is a fixed Markov chain that progressively destroys structure in the data. The reverse process is a learned neural network that predicts the noise added at each step — or equivalently, predicts the score (gradient of log-density). This score-based perspective, developed by Song et al. (2020), provides the deepest understanding of why diffusion models work and how to apply them to physics.

Denoising Score Matching for Molecular Geometry Generation

Python — Diffusion model: cosine schedule, denoising network, DDPM sampling for molecules

# Diffusion model for generating 3D molecular geometries
# Equivariant version: DiffSBDD, GeoDiff, EDM architectures
# We show the core diffusion/denoising pipeline for atom coordinates
import torch, torch.nn as nn
import numpy as np

# ── Noise schedule: cosine schedule works better than linear ────
def cosine_beta_schedule(T=1000, s=0.008):
    """Cosine noise schedule (Nichol & Dhariwal 2021) — better tail behaviour"""
    steps  = np.arange(T+1) / T + s
    alphas = np.cos(steps / (1+s) * np.pi/2)**2
    alphas = alphas / alphas[0]
    betas  = 1 - alphas[1:] / alphas[:-1]
    return np.clip(betas, 0, 0.999)

T       = 1000
betas   = cosine_beta_schedule(T)
alphas  = 1 - betas
alpha_bar = np.cumprod(alphas)    # cumulative product

# ── Forward process: add noise at timestep t ───────────────────
def forward_diffuse(x0, t):
    """Closed-form: x_t = sqrt(alpha_bar_t) * x0 + sqrt(1-alpha_bar_t) * eps"""
    ab  = torch.tensor(alpha_bar[t]).float().unsqueeze(-1)  # alpha_bar_t
    eps = torch.randn_like(x0)                      # Gaussian noise
    x_t = ab.sqrt() * x0 + (1 - ab).sqrt() * eps
    return x_t, eps                                  # noisy data + added noise

# ── Denoising network: U-Net or Transformer predicts noise eps ──
class DenoisingNet(nn.Module):
    def __init__(self, data_dim=3, hidden=256, T=1000):
        super().__init__()
        # Sinusoidal time embedding (same as transformers' positional encoding)
        self.time_embed = nn.Sequential(
            nn.Embedding(T, hidden),
            nn.Linear(hidden, hidden), nn.SiLU(),
            nn.Linear(hidden, hidden)
        )
        # Main denoising network
        self.net = nn.Sequential(
            nn.Linear(data_dim + hidden, hidden*2), nn.SiLU(),
            nn.Linear(hidden*2, hidden*2),           nn.SiLU(),
            nn.Linear(hidden*2, hidden),             nn.SiLU(),
            nn.Linear(hidden, data_dim)
        )

    def forward(self, x_t, t):
        t_emb = self.time_embed(t)                         # time context
        h     = torch.cat([x_t, t_emb], dim=-1)     # concatenate
        return self.net(h)                           # predict added noise eps

# ── Training: denoising score matching ──────────────────────────
model = DenoisingNet(data_dim=3*n_atoms)   # flattened 3D coords
optim = torch.optim.AdamW(model.parameters(), lr=2e-4)

for step in range(200000):
    x0  = next(train_iter)                          # batch of molecules
    t   = torch.randint(0, T, (x0.shape[0],)) # random timesteps
    x_t, eps_true = forward_diffuse(x0, t)
    eps_pred      = model(x_t, t)
    loss = nn.MSELoss()(eps_pred, eps_true)  # predict the noise
    loss.backward(); optim.step(); optim.zero_grad()

# ── Sampling: iterative DDPM reverse process ────────────────────
def sample_ddpm(model, shape, T=1000):
    x = torch.randn(*shape)         # start from pure noise
    for t in reversed(range(T)):
        t_tensor = torch.full((shape[0],), t, dtype=torch.long)
        eps      = model(x, t_tensor)      # predict noise
        a, ab = alphas[t], alpha_bar[t]
        x = (1/a**0.5) * (x - (1-a)/(1-ab)**0.5 * eps)
        if t > 0:
            x = x + betas[t]**0.5 * torch.randn_like(x)
    return x    # generated molecular geometry

✅ Real Applications of Diffusion in PhysicsDiffusion models are now used for: molecular geometry generation (GeoDiff, EDM — generate valid 3D drug molecules), protein structure prediction extensions (RFdiffusion for protein design), climate field downscaling (generate high-resolution weather fields from coarse model output), and fast detector simulation in HEP (replacing GEANT4 for large-scale Monte Carlo studies).

Section 5 — Symbolic Regression: Discovering Physical Laws from Data

This section covers what may be the most philosophically profound application of AI in all of science: automated discovery of analytic physical laws from data. Not fitting a parameterised model. Not a black-box neural network. A symbolic expression — an actual formula — that explains the data.

Symbolic regression searches the space of all possible mathematical expressions (combinations of +, −, ×, ÷, powers, exponentials, trigonometric functions) to find the formula that best fits a dataset. This is a combinatorially enormous search space — essentially all of mathematics — which is why it requires powerful search heuristics. Modern tools like PySR use evolutionary algorithms with multi-objective optimisation (accuracy vs complexity), parallelisation, and physics-aware constraints.

Symbolic regression: argmin over expression space of (fit quality + lambda * complexity)

Recovering Physical Laws: From Kepler to the NFW Profile

Python — PySR symbolic regression: recover Kepler's law, NFW profile, and Ohm's law from data

# pip install pysr
from pysr import PySRRegressor
import numpy as np
import pandas as pd

# ── Example 1: Recover Kepler's Third Law T^2 = a^3 ─────────────
np.random.seed(42)
a_data = np.random.uniform(0.2, 40.0, 500)    # semi-major axis [AU]
T_data = a_data**1.5 + np.random.normal(0, 0.05, 500)  # T [yr] + noise

model_kepler = PySRRegressor(
    niterations      = 100,
    binary_operators = ['+', '*', '/', '-', '**'],
    unary_operators  = ['sqrt', 'log', 'exp', 'abs'],
    maxsize          = 10,          # max expression tree nodes
    populations      = 30,          # independent parallel populations
    parsimony        = 0.001,        # penalise complexity — prefer elegance
    model_selection  = 'accuracy',
    verbosity        = 0
)
model_kepler.fit(a_data.reshape(-1,1), T_data)
best = model_kepler.get_best()
print(f'Best equation: {best.equation}')
print(f'Score: {best.loss:.6f}   Complexity: {best.complexity}')
# Expected output: x0^1.5  or equivalent form

# ── Example 2: Recover dark matter NFW profile from simulation ──
# N-body simulation: measure density rho at radii r
r    = np.random.uniform(0.1, 20.0, 800)    # radius [kpc]
rs   = 5.0                                # scale radius
rho0 = 1e7                                # characteristic density
rho  = rho0 / (r/rs * (1 + r/rs)**2) + np.random.normal(0, rho0*0.05, 800)

model_nfw = PySRRegressor(
    niterations      = 200,
    binary_operators = ['+', '*', '/', '**'],
    unary_operators  = ['exp', 'log', 'sqrt'],
    maxsize          = 16,
    populations      = 50,
    extra_sympy_mappings = {},
    verbosity        = 0
)
model_nfw.fit(r.reshape(-1,1), rho)
print(model_nfw.equations_.head(10)[['equation','loss','complexity']])
# PySR explores expression trees: rho0 / (r/rs * (1+r/rs)^2)
# With enough data and iterations it recovers the NFW form

# ── Multi-variable example: Ohm's law V = I * R ─────────────────
I_data = np.random.uniform(0.1, 5.0, 300)
R_data = np.random.uniform(10, 100, 300)
V_data = I_data * R_data + np.random.normal(0, 2.0, 300)
X = np.column_stack([I_data, R_data])
model_ohm = PySRRegressor(niterations=50, binary_operators=['*','+','-','/'])
model_ohm.fit(X, V_data)
print(model_ohm.get_best().equation)   # should give: x0 * x1

🧠 Concept: AI Feynman and the Search for Physical Laws

The AI Feynman project (Udrescu & Tegmark 2020) went further: it used a set of physics-motivated simplification strategies to discover equations in the Feynman Lectures on Physics dataset. Key insight: physical equations have special properties — dimensional consistency, separation of variables, compositional structure — that constrain the search space enormously. By checking these properties first, the algorithm reduces the effective search space by orders of magnitude before symbolic regression even begins.

Symbolic regression has found physical equations from: dark matter simulation data (NFW profiles), galaxy velocity rotation curves (MOND-like relations), turbulent flow simulations (new scaling laws), stellar evolution tracks (main sequence relations), and particle physics collider data (new kinematic observables). The PySR paper (Cranmer 2023) documents many of these discoveries. 💡 Where Symbolic Regression Is Used Today#fffbeb

Section 6 — Simulation-Based Inference: When You Can’t Write the Likelihood

One of the deepest problems in physics is parameter estimation when the likelihood is intractable. You have a simulator that takes parameters θ and produces data x — but there’s no analytic formula for p(x|θ). Traditional Bayesian inference requires evaluating this likelihood at every step. Simulation-Based Inference (SBI) avoids this entirely.

The key insight: even without the likelihood, you can simulate. Draw parameters from the prior, run the simulator, collect (x, θ) pairs. A neural network trained on these pairs learns to estimate the posterior p(θ|x) directly — bypassing the likelihood entirely. This is a genuine breakthrough for physics, where simulators (GEANT4, N-body codes, climate models) are often available but analytic likelihoods are not.

Python — Simulation-Based Inference: SNPE recovers physical parameters without analytic likelihood

# pip install sbi
from sbi import utils as sbi_utils
from sbi.inference import SNPE, SNLE, SNRE
import torch
import numpy as np

# ── Define a physics simulator ──────────────────────────────────
# Example: oscillation amplitude decay experiment
# theta = [A0, gamma, omega]  (amplitude, damping, frequency)
# x = observed amplitudes at 10 time points
def physics_simulator(theta):
    A0, gamma, omega = theta[:, 0], theta[:, 1], theta[:, 2]
    t    = torch.linspace(0, 5, 10).unsqueeze(0)       # [1, 10]
    A0   = A0.unsqueeze(1); gamma = gamma.unsqueeze(1); omega = omega.unsqueeze(1)
    x    = A0 * torch.exp(-gamma * t) * torch.cos(omega * t)
    x   += 0.05 * torch.randn_like(x)               # measurement noise
    return x   # [N, 10] — 10 time-point measurements

# ── Define prior over physical parameters ──────────────────────
prior = sbi_utils.BoxUniform(
    low  = torch.tensor([0.5, 0.05, 0.5]),  # [A0_min, gamma_min, omega_min]
    high = torch.tensor([3.0, 1.0,  4.0])   # [A0_max, gamma_max, omega_max]
)

# ── Stage 1: Draw from prior and simulate ──────────────────────
theta_sim = prior.sample((20000,))     # 20k samples from prior
x_sim     = physics_simulator(theta_sim)  # simulate each

# ── Stage 2: Train neural posterior estimator (SNPE) ───────────
# SNPE learns: p(theta | x) directly via conditional density estimation
inference = SNPE(prior=prior)  # Sequential Neural Posterior Estimation
inference = inference.append_simulations(theta_sim, x_sim)
density_estimator = inference.train(training_batch_size=512)

# ── Stage 3: Given observed data, get full posterior ───────────
true_theta = torch.tensor([[1.5, 0.3, 2.0]])   # true params (unknown to model)
x_observed = physics_simulator(true_theta)

posterior = inference.build_posterior(density_estimator)
samples   = posterior.sample((5000,), x=x_observed)

print(f'A0:    {samples[:,0].mean():.3f} ± {samples[:,0].std():.3f}  (true: 1.5)')
print(f'gamma: {samples[:,1].mean():.3f} ± {samples[:,1].std():.3f}  (true: 0.3)')
print(f'omega: {samples[:,2].mean():.3f} ± {samples[:,2].std():.3f}  (true: 2.0)')

# Plot marginal posteriors to see correlations between parameters
# Compare width to prior to quantify information gain from data
# Active learning: use posterior to design the next experiment!

Section 7 — Bayesian Optimal Experiment Design with ML

The most sophisticated application of generative models in experimental physics is Bayesian Optimal Experiment Design (BOED): given your current knowledge (prior), design the next experiment to maximally reduce uncertainty about the parameters you care about. This is the information-theoretic ideal — and it’s now computationally tractable thanks to neural estimators.

The quantity to maximise is the Expected Information Gain (EIG) — the expected reduction in entropy of the posterior after running an experiment with design d:

EIG(d) = E_p(x|d)[H(p(θ)) − H(p(θ|x, d))] = I(θ; x | d)

Expected Information Gain = mutual information between parameters and data, given design d

Python — Bayesian optimal experiment design: gradient ascent on Expected Information Gain

# Bayesian Optimal Experiment Design using implicit likelihood ratio
# Goal: choose measurement times t* that maximise parameter uncertainty reduction
import torch, torch.nn as nn
import numpy as np

# ── Implicit likelihood ratio estimator (NRE approach) ─────────
# Train a classifier to distinguish p(theta|x) from p(theta)*p(x)
# The log-odds gives the log-likelihood ratio: log p(x|theta)/p(x)
class LikelihoodRatioNet(nn.Module):
    def __init__(self, theta_dim=3, x_dim=10):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(theta_dim + x_dim, 256), nn.ReLU(),
            nn.Linear(256, 256),                 nn.ReLU(),
            nn.Linear(256, 128),                 nn.ReLU(),
            nn.Linear(128, 1)                    # log-likelihood ratio
        )

    def forward(self, theta, x):
        inp = torch.cat([theta, x], dim=-1)
        return self.net(inp).squeeze(-1)  # log r(theta, x)

# ── Estimate EIG for a candidate experiment design d ───────────
def estimate_eig(nre_model, prior, simulator, design, n_outer=1000, n_inner=50):
    """Monte Carlo estimate of EIG via nested sampling."""
    theta_outer = prior.sample((n_outer,))
    x_outer     = simulator(theta_outer, design)
    # EIG = E_theta,x[log r(theta,x)] - log E_theta[r(theta,x)]
    log_r_joint    = nre_model(theta_outer, x_outer)
    theta_inner = prior.sample((n_outer, n_inner, theta_outer.shape[-1]))
    x_expand    = x_outer.unsqueeze(1).expand(-1, n_inner, -1)
    log_r_marg  = nre_model(
        theta_inner.reshape(-1, theta_outer.shape[-1]),
        x_expand.reshape(-1, x_outer.shape[-1])
    ).reshape(n_outer, n_inner).logsumexp(dim=1) - np.log(n_inner)
    eig_estimate = (log_r_joint - log_r_marg).mean()
    return eig_estimate.item()

# ── Optimise experiment design by gradient ascent on EIG ────────
design = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)  # measurement times
design_opt = torch.optim.Adam([design], lr=0.05)

for step in range(200):
    design_opt.zero_grad()
    eig  = estimate_eig(nre_model, prior, physics_simulator, design)
    loss = -eig                              # maximise EIG
    loss.backward(); design_opt.step()
    if (step+1) % 50 == 0:
        print(f"Step {step+1}: EIG = {eig:.4f}, design = {design.detach().numpy()}")

# Optimal design tells you WHEN to measure for maximum information
# Real use: Bayesian adaptive clinical trials, telescope scheduling,
# optimal sampling of parameter space for surrogate model training

External References & Further Reading

Kingma & Welling (2014) — Auto-Encoding Variational Bayes. ICLR. arXiv:1312.6114 — The original VAE paper.
Dinh et al. (2017) — Density estimation using Real-valued Non-Volume Preserving (Real NVP) transformations. ICLR. arXiv:1605.08803 — RealNVP normalizing flows.
Song et al. (2021) — Score-Based Generative Modeling through Stochastic Differential Equations. ICLR. arXiv:2011.13456 — Unified score-based / diffusion framework.
Cranmer (2023) — PySR: Fast & Parallelized Symbolic Regression in Python/Julia. arXiv:2305.01582
Udrescu & Tegmark (2020) — AI Feynman: A physics-inspired method for symbolic regression. Science Advances. arXiv:1905.11481
Cranmer, Brehmer & Louppe (2020) — The frontier of simulation-based inference. PNAS. doi/pnas.1912789117
Hoogeboom et al. (2022) — Equivariant Diffusion for Molecule Generation in 3D (EDM). ICML. arXiv:2203.17003 — Equivariant diffusion for molecular geometry.

📋 Key Takeaways — Cluster 7

Four families, four niches. VAEs for latent space learning and anomaly detection. Flows when you need exact likelihood. Diffusion when sample quality is paramount. Symbolic regression when you want the formula itself.
Normalizing flows are the HEP standard. Exact likelihood means you can use them directly for hypothesis testing, unfolding, and likelihood ratios. This is why they replaced GANs in most physics generation tasks.
Diffusion models deliver the highest quality. EDM for molecules, RFdiffusion for proteins, score-matching for climate fields. Slow at inference but unmatched quality. Use DDIM or DPM-Solver for faster sampling.
PySR and AI Feynman recover physical laws. These are not just curve-fitting tools — they search the space of analytic expressions. Dimensional analysis + PySR is a powerful combination for experimental data.
SBI bypasses the intractable likelihood problem. Simulate (theta, x) pairs, train SNPE, get the full posterior for any observation. The sbi library makes this a 10-line implementation.
BOED closes the loop. Use your current posterior to design the experiment that maximises information gain. This is the principled answer to “what should I measure next?”

← Previous

Cluster 6: AI for Quantum Mechanics

↑ Pillar Page

Cluster 8: Reinforcement Learning for Physics

Generative Models for Physics Discovery: VAEs, Flows & Diffusion