Let's be honest β curve fitting is one of those things every physics student does a hundred times before they fully understand what they're actually doing.
You plot your data. You eyeball a trend. You fire up Excel or maybe Python, run some magic function, get a line through your points, and write down the RΒ² value like it means something profound. Your professor nods. You move on.
But here's the thing: that process falls apart the moment your data gets complex. What happens when your data follows a distribution you can't name? When you have 10,000 points from a particle detector? When your model has 15 parameters and traditional least-squares just... breaks?
Why Traditional Curve Fitting Falls Short
Classical curve fitting is built on one core assumption: you already know the functional form of your model. You know it's exponential decay, or Gaussian, or a power law. You then use least-squares minimization to find the parameters that make your chosen function fit the data as closely as possible.
This works beautifully when the assumption holds. Newton's law of cooling? Exponential β done. Pendulum period vs. length? Square root β easy. Radioactive decay? Classic exponential again.
But modern physics doesn't always give you that luxury. Consider these real scenarios:
- Spectroscopy with overlapping peaks β 8 overlapping Lorentzian peaks, some asymmetric, with baseline drift.
scipy.optimize.curve_fitwill fail to converge or get stuck in a local minimum depending on your initial guess. - High-energy physics detector signals β signals that mix multiple response functions, noise spectra, and systematic artifacts. The "true" functional form is a convolution of multiple physics processes.
- Condensed matter phase transitions β near critical points, scaling behavior follows power laws with exponents that depend on universality class, and the fitting is notoriously sensitive to range selection.
- Nonlinear dynamics and chaos β chaotic systems produce data that no simple analytic function can describe.
In all these cases, machine learning offers something classical methods don't: the ability to fit complex, high-dimensional relationships without requiring you to specify the functional form in advance.
The Mathematical Foundation: What "Fitting" Really Means
Before jumping into code, let's make sure the underlying mathematics is solid. This is physics β we respect the math.
The Optimization Problem
Curve fitting is, at its core, an optimization problem. Given data points $(x_i, y_i)$ with $i = 1, \ldots, N$, we want to find parameters $\theta$ of a model $f(x;\,\theta)$ that minimizes a loss function $\mathcal{L}$.
For ordinary least squares (OLS), this is:
This is the $L_2$ norm of the residuals. When your measurement errors are Gaussian and independent, minimizing this is mathematically equivalent to Maximum Likelihood Estimation (MLE) β a beautiful connection between statistics and physics.
But what if your errors aren't Gaussian? What if you have outliers? Then $L_1$ loss (absolute deviations) or Huber loss might serve you better. What if you want to prevent overfitting by penalizing large parameter values? Then you add a regularization term:
This is Ridge Regression (L2 regularization). Swap $\|\theta\|^2$ for $\|\theta\|_1$ and you get Lasso regression. These aren't just tricks β they're physically motivated. Regularization is Bayesian inference with a Gaussian prior (Ridge) or Laplacian prior (Lasso) on your parameters.
The Bias-Variance Tradeoff in Physics
Every model lives on the spectrum between two failure modes:
- High bias (underfitting): Your model is too simple and misses the true pattern. Think fitting a straight line to oscillatory data.
- High variance (overfitting): Your model is too complex and fits the noise along with the signal. Think fitting a degree-20 polynomial to 10 data points.
Tool 1: Upgraded Classical Fitting with SciPy
Let's start with what you know and make it much better. scipy.optimize.curve_fit is a workhorse, but most students use only 10% of its capabilities.
Basic Setup β Damped Oscillator
import numpy as np import matplotlib.pyplot as plt from scipy.optimize import curve_fit from scipy.stats import chi2 # Simulate physics data: damped oscillation with noise np.random.seed(42) t = np.linspace(0, 10, 100) true_params = {'A': 2.0, 'gamma': 0.3, 'omega': 2.0, 'phi': 0.5} def damped_oscillator(t, A, gamma, omega, phi): return A * np.exp(-gamma * t) * np.cos(omega * t + phi) # Add realistic Gaussian noise sigma = 0.15 y_data = damped_oscillator(t, **true_params) + np.random.normal(0, sigma, len(t))
Fitting with Proper Uncertainty Propagation
# Initial parameter guess p0 = [1.5, 0.2, 2.1, 0.3] # Fit with absolute sigma (measurement uncertainties) popt, pcov = curve_fit( damped_oscillator, t, y_data, p0=p0, sigma=sigma * np.ones(len(t)), # measurement uncertainties absolute_sigma=True # treat sigma as absolute, not relative ) # Extract parameters and 1-sigma uncertainties perr = np.sqrt(np.diag(pcov)) param_names = ['A', 'gamma', 'omega', 'phi'] print("Fitted Parameters:") for name, val, err in zip(param_names, popt, perr): print(f" {name} = {val:.4f} Β± {err:.4f}")
The Chi-Squared Goodness-of-Fit Test β Never Skip This
Here's the step that separates careful physicists from sloppy ones:
# Compute chi-squared residuals = y_data - damped_oscillator(t, *popt) chi_squared = np.sum((residuals / sigma)**2) dof = len(t) - len(popt) # degrees of freedom reduced_chi_sq = chi_squared / dof print(f"\nGoodness of Fit:") print(f" Chi-squared = {chi_squared:.2f}") print(f" Degrees of freedom = {dof}") print(f" Reduced chi-sq = {reduced_chi_sq:.4f}") print(f" P-value = {1 - chi2.cdf(chi_squared, dof):.4f}")
Tool 2: Polynomial Regression and the Danger of Overfitting
Polynomial regression is the simplest form of "machine learning" for curve fitting, and it's a perfect vehicle for truly understanding overfitting.
from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import Ridge from sklearn.model_selection import cross_val_score from sklearn.pipeline import Pipeline # Data with known quadratic behavior + noise x = np.linspace(-3, 3, 50) y_true = 2*x**2 - x + 1 y_noisy = y_true + np.random.normal(0, 1.5, len(x)) # Test polynomials of different degrees via cross-validation degrees = [1, 2, 5, 10, 20] for deg in degrees: model = Pipeline([ ('poly', PolynomialFeatures(degree=deg)), ('ridge', Ridge(alpha=0.1)) ]) scores = cross_val_score( model, x.reshape(-1,1), y_noisy, cv=5, scoring='neg_mean_squared_error' ) print(f"Degree {deg:2d}: CV MSE = {-scores.mean():.4f} Β± {scores.std():.4f}")
Running this reveals something every physics student should internalize: the degree-2 polynomial wins. The degree-20 polynomial fits the training data nearly perfectly but generalizes poorly. It has learned the noise, not the physics.
Tool 3: Gaussian Process Regression β The Bayesian Physicist's Curve Fitter
This is where things get genuinely powerful. Gaussian Process Regression (GPR) is arguably the most principled tool available for physics curve fitting, and it remains underused in the physics community relative to its power.
What Is a Gaussian Process?
A Gaussian Process (GP) is a probability distribution over functions. Instead of saying "I believe the true function is $f(x;\theta)$," a GP says: "I believe the true function is drawn from this infinite-dimensional probability distribution over all possible smooth functions."
Formally, a GP is defined by:
- A mean function $m(x)$ β often taken as zero
- A kernel function $k(x, x')$ β which encodes our prior beliefs about smoothness, periodicity, and length scale
Given observed data $(X, y)$, the GP gives you a full posterior distribution over functions, including:
- The posterior mean β your best estimate of the true curve
- The posterior variance β your uncertainty at every point, derived from data and prior assumptions
Implementation with scikit-learn
from sklearn.gaussian_process import GaussianProcessRegressor from sklearn.gaussian_process.kernels import ( RBF, Matern, RationalQuadratic, WhiteKernel, ConstantKernel as C ) # Spectroscopy-like data: two overlapping peaks x_data = np.linspace(0, 10, 80) peak1 = 2.0 * np.exp(-0.5 * ((x_data - 3) / 0.5)**2) peak2 = 1.5 * np.exp(-0.5 * ((x_data - 6) / 0.8)**2) y_data = peak1 + peak2 + np.random.normal(0, 0.1, len(x_data)) X = x_data.reshape(-1, 1) # Define kernel: smooth variation + noise kernel = ( C(1.0, (1e-3, 1e3)) * RBF(length_scale=1.0, length_scale_bounds=(1e-2, 10.0)) + WhiteKernel(noise_level=0.01, noise_level_bounds=(1e-5, 1e1)) ) # Fit and predict gp = GaussianProcessRegressor( kernel=kernel, n_restarts_optimizer=10, normalize_y=True ) gp.fit(X, y_data) x_pred = np.linspace(0, 10, 500).reshape(-1, 1) y_pred, y_std = gp.predict(x_pred, return_std=True) print(f"Optimized kernel: {gp.kernel_}")
Choosing the Right Kernel for Your Physics
The kernel is where your physics knowledge enters the GP. Different kernels encode different prior beliefs:
| Kernel | Physics Use Case |
|---|---|
| RBF (Squared Exponential) | Smooth, infinitely differentiable signals. NMR spectra, smooth potentials, any signal where derivatives of all orders are finite. |
| MatΓ©rn 3/2 or 5/2 | Once or twice differentiable. Turbulence, rough surfaces, most experimental lab data β this is your default for unknown physics. |
| Periodic | Oscillatory signals, crystallographic data, any lattice or periodic system. |
| Rational Quadratic | Multi-scale variation β signals with features at multiple length scales simultaneously. |
| Linear | Linear trends, systematic drift in a detector or instrument. |
| WhiteKernel | Independent Gaussian measurement noise. Always add this to your kernel. |
You can combine kernels additively or multiplicatively. For example, a slowly varying background with a periodic oscillation:
from sklearn.gaussian_process.kernels import ExpSineSquared kernel = ( C(1.0) * RBF(10.0) + # Slow background variation C(0.5) * ExpSineSquared( # Periodic component length_scale=1.0, periodicity=2*np.pi ) + WhiteKernel(0.01) # Measurement noise )
Tool 4: Neural Networks as Universal Function Approximators
The Universal Approximation Theorem states that a neural network with at least one hidden layer of sufficient width can approximate any continuous function to arbitrary precision. This makes neural networks the most general curve fitters available.
When to Use Neural Networks for Fitting
Neural networks shine in physics when your data is large (>10,000 points), multi-dimensional, or when you need to emulate expensive numerical simulations. They are overkill for small datasets with known functional forms β use Gaussian Processes instead in those cases.
import torch import torch.nn as nn import torch.optim as optim from torch.utils.data import DataLoader, TensorDataset # Simulate quantum well transmission probability def transmission_probability(E, V0, L): k1 = np.sqrt(np.maximum(E, 1e-10)) k2 = np.sqrt(np.maximum(V0 - E, 1e-10)) T = np.where( E < V0, 1 / (1 + (V0**2 * np.sinh(k2*L)**2) / (4*E*(V0-E) + 1e-10)), 1 / (1 + (V0**2 * np.sin(k1*L)**2) / (4*E*(E-V0) + 1e-10)) ) return np.clip(T, 0, 1) # Generate dataset E_vals = np.random.uniform(0.1, 5.0, 2000) V0, L = 3.0, 2.0 T_vals = transmission_probability(E_vals, V0, L) T_noisy = np.clip(T_vals + np.random.normal(0, 0.02, len(E_vals)), 0, 1) # Neural network with physically meaningful output class PhysicsNet(nn.Module): def __init__(self): super().__init__() self.network = nn.Sequential( nn.Linear(1, 64), nn.Tanh(), # Tanh preferred for smooth physics functions nn.Linear(64, 128), nn.Tanh(), nn.Linear(128, 64), nn.Tanh(), nn.Linear(64, 1), nn.Sigmoid() # Output bounded [0,1] β physically required ) def forward(self, x): return self.network(x)
Uncertainty Quantification with MC Dropout
A neural network by default gives you a point estimate, not an uncertainty band. Physics demands uncertainties. Monte Carlo Dropout is an elegant solution: keep dropout active at inference time, run many forward passes, and treat the spread of predictions as your uncertainty.
class PhysicsNetDropout(nn.Module): def __init__(self, dropout_rate=0.1): super().__init__() self.network = nn.Sequential( nn.Linear(1, 64), nn.Tanh(), nn.Dropout(dropout_rate), nn.Linear(64, 128), nn.Tanh(), nn.Dropout(dropout_rate), nn.Linear(128, 64), nn.Tanh(), nn.Dropout(dropout_rate), nn.Linear(64, 1), nn.Sigmoid() ) def predict_with_uncertainty(self, x, n_samples=100): """Keep dropout ACTIVE during inference for uncertainty""" self.train() # Critical: keep dropout on! predictions = torch.stack([self(x) for _ in range(n_samples)]) mean = predictions.mean(dim=0) std = predictions.std(dim=0) return mean.detach().numpy(), std.detach().numpy()
Tool 5: Physics-Informed Neural Networks (PINNs)
This is the frontier. PINNs are neural networks that don't just fit data β they're constrained to obey physical laws encoded as differential equations. If you know your system follows Newton's second law, or the diffusion equation, or Maxwell's equations, you bake that directly into the loss function.
The PINN Loss Function
The total PINN loss has two components:
where $\mathcal{L}_{\text{physics}}$ penalizes violations of the governing equation. For a damped harmonic oscillator:
class DampedOscillatorPINN(nn.Module): def __init__(self): super().__init__() self.net = nn.Sequential( nn.Linear(1, 32), nn.Tanh(), nn.Linear(32, 64), nn.Tanh(), nn.Linear(64, 32), nn.Tanh(), nn.Linear(32, 1) ) # Physics parameters as LEARNABLE variables self.gamma = nn.Parameter(torch.tensor([0.5])) self.omega0 = nn.Parameter(torch.tensor([2.0])) def forward(self, t): return self.net(t) def physics_residual(self, t): t.requires_grad_(True) x = self.forward(t) # First derivative dx/dt via autograd dx_dt = torch.autograd.grad( x, t, grad_outputs=torch.ones_like(x), create_graph=True )[0] # Second derivative dΒ²x/dtΒ² d2x_dt2 = torch.autograd.grad( dx_dt, t, grad_outputs=torch.ones_like(dx_dt), create_graph=True )[0] # ODE residual (should be zero if physics is satisfied) residual = d2x_dt2 + 2*self.gamma*dx_dt + self.omega0**2 * x return residual def pinn_loss(model, t_data, x_data, t_physics, lambda_physics=1.0): loss_data = nn.MSELoss()(model(t_data), x_data) residual = model.physics_residual(t_physics) loss_physics = torch.mean(residual**2) return loss_data + lambda_physics * loss_physics
Practical Guide: Choosing the Right Method
| Method | Use When⦠| Avoid When⦠|
|---|---|---|
| scipy.curve_fit | Functional form is known, data is clean, <10 parameters, fast results needed | Parameters >10, complex landscapes, no known form |
| Gaussian Process | No known functional form, uncertainty quantification critical, smallβmedium data (<10k points) | Very large datasets (>100k), real-time inference required |
| Polynomial / Spline | Need smooth interpolant, data has no obvious form but is continuous | Extrapolation (polynomial behavior is unstable outside training range) |
| Neural Networks | Large data (>10k), multi-dimensional input, surrogate modeling, fast inference | Small datasets, when uncertainty propagation is critical |
| PINNs | Governing equations are known, sparse/noisy data, inverse problem (learn parameters from data) | Equations are unknown, data is abundant and clean (standard NN is simpler) |
Common Pitfalls and How to Avoid Them
1. Not Checking Your Residuals
Always plot your residuals. Systematic patterns β oscillations, trends, funnel shapes β mean your model is wrong, regardless of what RΒ² says.
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5)) residuals = y_data - model_prediction ax1.scatter(x_data, residuals, s=10) ax1.axhline(0, color='r', linestyle='--') ax1.set_title("Residuals vs Input") ax2.hist(residuals, bins=30, edgecolor='k') ax2.set_title("Residual Distribution (should be Gaussian)") plt.tight_layout()
2. Not Passing Measurement Uncertainties
In physics, every measurement has an uncertainty. Always pass sigma to curve_fit and set absolute_sigma=True. Never treat all data points as equally reliable when they're not.
3. Getting Trapped in Local Minima
For complex problems, use global optimization instead of relying on a good initial guess:
from scipy.optimize import differential_evolution bounds = [(0, 5), (0, 2), (0, 5), (-np.pi, np.pi)] result = differential_evolution( lambda p: np.sum((y_data - damped_oscillator(t, *p))**2), bounds, seed=42, tol=1e-10 )
4. Forgetting to Normalize Data for Neural Networks
from sklearn.preprocessing import StandardScaler scaler_X = StandardScaler() scaler_y = StandardScaler() X_norm = scaler_X.fit_transform(X) y_norm = scaler_y.fit_transform(y.reshape(-1,1)) # After predicting, always inverse_transform back to physical units! y_physical = scaler_y.inverse_transform(y_predicted)
Complete Worked Example: Fitting Muon Lifetime Data
Let's bring everything together with a realistic experiment: fitting exponential decay data from a muon lifetime measurement β a classic undergraduate particle physics lab.
import numpy as np import matplotlib.pyplot as plt from scipy.optimize import curve_fit from scipy.stats import chi2 # Muon lifetime: tau = 2.197 microseconds (PDG value) np.random.seed(0) tau_true = 2.197 t_bins = np.linspace(0, 12, 60) t_centers = (t_bins[:-1] + t_bins[1:]) / 2 # True signal + constant background (accidentals) N0, bg = 5000, 50 expected = N0 * np.exp(-t_centers / tau_true) + bg counts = np.random.poisson(expected).astype(float) count_errors = np.sqrt(np.maximum(counts, 1)) # Poisson errors # Model: exponential decay + constant background def decay_model(t, N0, tau, bg): return N0 * np.exp(-t / tau) + bg # Exclude empty bins to avoid division by zero mask = counts > 0 popt, pcov = curve_fit( decay_model, t_centers[mask], counts[mask], p0=[4000, 2.0, 40], sigma=count_errors[mask], absolute_sigma=True ) perr = np.sqrt(np.diag(pcov)) N0_fit, tau_fit, bg_fit = popt # Goodness of fit residuals = counts[mask] - decay_model(t_centers[mask], *popt) chi_sq = np.sum((residuals / count_errors[mask])**2) dof = mask.sum() - len(popt) p_value = 1 - chi2.cdf(chi_sq, dof) print(f"Muon Lifetime Results:") print(f" Ο = {tau_fit:.4f} Β± {perr[1]:.4f} ΞΌs") print(f" PDG value: {tau_true:.4f} ΞΌs") print(f" Deviation: {abs(tau_fit - tau_true)/perr[1]:.2f}Ο") print(f" Reduced ΟΒ²: {chi_sq/dof:.3f}") print(f" P-value: {p_value:.4f}")
This example demonstrates every element of responsible physics fitting in one clean workflow: Poisson-appropriate error treatment, goodness-of-fit testing with chi-squared and p-value, uncertainty reporting in physically meaningful units, and comparison to the known PDG value.
Where Machine Learning for Physics Fitting Is Heading
The field is moving fast. Three trends worth watching:
Normalizing Flows β Generative models that learn the full probability distribution of your data, not just the mean. Increasingly used for parameter estimation in high-dimensional physics problems like gravitational wave parameter inference.
Symbolic Regression β Algorithms like PySR and the AI Feynman project that don't just fit data but try to discover the analytic formula that generated it. This is machine learning doing theoretical physics β recovering equations like Planck's law from data alone.
Differentiable Physics Simulations β Physics simulators written in differentiable frameworks (JAX, PyTorch), allowing gradient-based optimization of physical parameters directly through the simulation. No surrogate needed.
The physicist of the future won't choose between traditional fitting and machine learning. They'll use a toolkit spanning the full spectrum β reaching for the right tool based on what the data and the physics demand.
π Key Takeaways
- Classical fitting requires knowing the form. ML doesn't β that's its core advantage for complex physics.
- Curve fitting is optimization. Regularization = Bayesian inference. Reduced chi-squared = physical diagnostic.
- SciPy done right means: absolute sigma, chi-squared testing, residual analysis, global optimization for multi-modal landscapes.
- Gaussian Process Regression gives you full uncertainty quantification. Choose your kernel based on physics knowledge.
- Neural networks are universal approximators for large datasets. Use MC Dropout for uncertainty bands.
- PINNs bake differential equations into the loss function, learning physical parameters from data structure alone.
- Always check residuals. Always report uncertainties. Never trust RΒ² alone.
π Further Reading
- Bishop, C.M. β Pattern Recognition and Machine Learning. Chapters 1β3 for the optimization and probabilistic foundations.
- Rasmussen & Williams β Gaussian Processes for Machine Learning. Free online. The definitive reference for GPR.
- Rackauckas et al. β Universal Differential Equations for Scientific Machine Learning. arXiv:2001.04385.
- Cranmer, Brehmer & Louppe β The Frontier of Simulation-Based Inference. PNAS 2020.
- PySR documentation β astroautomata.com/PySR/ β Symbolic regression library built for physics researchers.
- Taylor, J.R. β An Introduction to Error Analysis. The essential foundation for uncertainty propagation.

Pingback: AI for Physics Students: The Complete Guide