Learning Objectives
- State the PDF of a Gaussian distribution and identify its mean and variance parameters.
- Write the PDF of a multivariate Gaussian using vector-matrix notation.
- Describe the frequentist framework: unknown \(\theta\), estimator \(\hat\theta = T(Y)\).
- Define bias, variance, and mean squared error (MSE) of an estimator.
- State the maximum likelihood (ML) estimator and derive it for a simple Gaussian example.
- Explain what it means for an estimator to be consistent, unbiased, and UMVU.
- Explain the significance of the Cramér-Rao bound.
Notation used in this lesson
Motivation: Estimating Unknown Quantities from Data
In many physics experiments you cannot measure a quantity of interest directly. Instead, you observe a noisy, indirect signal and must infer the underlying quantity from that signal. This is the estimation problem.
Examples are everywhere: inferring the mass of a particle from a track in a detector, estimating the distance to a star from its apparent brightness, determining the temperature of a plasma from its emission spectrum, or recovering the true intensity of a laser beam from speckle-corrupted measurements. In each case, your measurement is a random variable that depends on the unknown quantity you want to estimate.
Two broad frameworks address this problem: frequentist (the subject of this lesson) and Bayesian (Lesson 32). Both are valuable; understanding when to use each is a core skill.
Common Probability Distributions
The Gaussian (Normal) Distribution
The most widely used distribution in physics and engineering is the Gaussian distribution, written \(X \sim \mathcal{N}(\mu, \sigma^2)\). Its probability density function (PDF) is:
The Multivariate Gaussian Distribution
When \(X \in \mathbb{R}^p\) is a vector rather than a scalar, where each element is an independently Gaussian distributed random variable, the Gaussian distribution generalizes to:
where \(\mu \in \mathbb{R}^p\) is the mean vector and \(R \in \mathbb{R}^{p \times p}\) is the covariance matrix (symmetric positive-definite). We write \(X \sim \mathcal{N}(\mu, R)\).
Other distributions
Not all physical measurements are Gaussian. Counting experiments (photons, decay events) follow Poisson distributions. Some intensity measurements (e.g., laser speckle) follow exponential distributions. The estimation framework below applies to any distribution — the key is knowing (or assuming) the form of the probability density function for your measurements.
The Frequentist Framework
Throughout this lesson \(\theta\) denotes the unknown parameter we want to estimate — for example, the mean \(\mu\) of a Gaussian, the rate \(\lambda\) of an exponential distribution, or any other scalar or vector that characterizes \(p_\theta(y)\). Our job is to infer \(\theta\) from the observed data \(Y\).
Setup
In the frequentist approach, the unknown quantity \(\theta\) is treated as a fixed but unknown constant (not random). We observe data \(Y \in \mathbb{R}^n\) drawn from a distribution \(p_\theta(y)\) that depends on \(\theta\). Our goal is to design a function \(T: \mathbb{R}^n \to \Omega\) such that
is a good estimate of the true \(\theta\). Because \(T\) is a function of the random vector \(Y\), the estimate \(\hat\theta\) is itself a random variable — it varies across repeated experiments.
Bias, variance, and MSE
Three quantities characterize the quality of an estimator:
| Quantity | Definition | Meaning |
|---|---|---|
| Bias | \(\text{bias}_\theta = \bar\theta - \theta\) | Systematic offset: does the estimator on average give the right answer? |
| Variance | \(\text{var}_\theta = \mathbb{E}_\theta\!\left[(\hat\theta - \bar\theta)^2\right]\) | Spread of estimates around their mean: how consistent is it? |
| MSE | \(\text{MSE}_\theta = \mathbb{E}_\theta\!\left[(\hat\theta - \theta)^2\right]\) | Overall error from the true value: the quantity we most care about. |
These are related by the fundamental identity:
An estimator is unbiased if \(\text{bias}_\theta = 0\) for all \(\theta\), i.e., \(\bar\theta = \theta\). An unbiased estimator's MSE equals its variance.
Consistency
An estimator \(\hat\theta_n\) based on \(n\) observations is consistent if it converges to the true \(\theta\) in probability as \(n \to \infty\). Consistency is the minimum property we expect of an estimator: with enough data, it should converge to the truth.
The uniformly minimum variance unbiased (UMVU) estimator is the best possible unbiased estimator — it achieves the lowest variance among all unbiased estimators, for every value of \(\theta\).
The Maximum Likelihood (ML) Estimator
Definition
The most widely used frequentist estimator is the maximum likelihood (ML) estimator. The idea is simple: choose the value of \(\theta\) that makes the observed data \(Y\) most probable.
Properties of the ML estimator
- Consistent: \(\hat\theta_{\mathrm{ML}} \to \theta\) as the number of observations \(n \to \infty\).
- Asymptotically efficient: As \(n \to \infty\), the ML estimate attains the Cramér-Rao bound — the theoretical lower bound on the variance of any unbiased estimator. No unbiased estimator can do better asymptotically.
- Not necessarily unbiased for finite \(n\), but bias vanishes as \(n\) grows.
Deriving the ML estimator: Gaussian mean (Example 2.3)
Let \(\{Y_k\}_{k=1}^n\) be i.i.d. random variables with distribution \(\mathcal{N}(\theta, \sigma^2)\) where \(\sigma^2\) is known and \(\theta\) is the unknown mean. The log-likelihood for \(n\) independent observations is:
Differentiating with respect to \(\theta\) and setting equal to zero:
Solving for \(\hat\theta\):
Recommended Video
An intuitive and visual walkthrough of maximum likelihood estimation — builds up the core idea from first principles and explains why MLE is the most important theory in statistics.
Worked Example — MATLAB
This example demonstrates ML estimation of the mean of a Gaussian distribution. We generate data, compute the ML estimate, and verify that it is unbiased by averaging over many realizations. The Gaussian distribution is used here for illustration — your HW 31 applies ML estimation to a different distribution (exponential).
%% Lesson 31 — ML Estimation of Gaussian Mean (demonstration)
% True parameter: theta = 3 (the mean)
% Distribution: Y ~ N(theta, sigma^2), sigma^2 = 4 (known)
% ML estimator: theta_hat = mean(Y)
% (HW 31 uses a different distribution -- do not use this directly)
rng(42);
theta_true = 3.0;
sigma2 = 4.0;
%% Single experiment: estimate from n=20 samples
n = 20;
Y = theta_true + sqrt(sigma2) * randn(n, 1);
theta_hat = mean(Y);
fprintf('True theta = %.2f, ML estimate = %.4f\n', theta_true, theta_hat);
%% Demonstrate unbiasedness: average over M=1000 experiments
M = 1000;
n_vec = [5, 10, 20, 50, 100, 200];
mean_est = zeros(size(n_vec));
var_est = zeros(size(n_vec));
for j = 1:length(n_vec)
estimates = zeros(M, 1);
for i = 1:M
Y_i = theta_true + sqrt(sigma2) * randn(n_vec(j), 1);
estimates(i) = mean(Y_i);
end
mean_est(j) = mean(estimates);
var_est(j) = var(estimates);
end
%% Plot
figure;
subplot(2,1,1);
plot(n_vec, mean_est, 'bo-', 'LineWidth', 1.5);
yline(theta_true, 'r--', 'LineWidth', 1.5);
xlabel('n (samples)'); ylabel('Mean of $\hat{\theta}$', 'Interpreter', 'latex');
title('ML Estimator Mean vs. Number of Samples');
legend({'Mean of estimates', 'True $\theta$'}, 'Interpreter', 'latex'); grid on;
subplot(2,1,2);
semilogy(n_vec, var_est, 'bs-', 'LineWidth', 1.5); hold on;
semilogy(n_vec, sigma2 ./ n_vec, 'r--', 'LineWidth', 1.5);
xlabel('n (samples)'); ylabel('$\mathrm{Var}(\hat{\theta})$ [log scale]', 'Interpreter', 'latex');
title('Variance of ML Estimator (scales as $\sigma^2/n$)', 'Interpreter', 'latex');
legend({'Empirical variance', '$\sigma^2/n$ (Cramer-Rao bound)'}, 'Interpreter', 'latex'); grid on;
Summary
| Concept | Key Idea |
|---|---|
| Frequentist framework | \(\theta\) is unknown but fixed. Estimate it from data \(Y\) using \(\hat\theta = T(Y)\). |
| Bias | \(\bar\theta - \theta\): systematic offset of the estimator from the truth. |
| Variance | Spread of the estimator around its mean across repeated experiments. |
| MSE | \(\text{var} + \text{bias}^2\): overall mean squared distance from the true \(\theta\). |
| Consistency | Estimator converges to \(\theta\) as \(n \to \infty\). Minimum requirement. |
| ML estimator | \(\hat\theta_{\mathrm{ML}} = \arg\max_\theta \log p_\theta(Y)\). Consistent and asymptotically efficient. |
| Cramér-Rao bound | Lower bound on variance of any unbiased estimator. ML attains it asymptotically. |
| Gaussian mean ML | Sample mean \((1/n)\sum Y_k\) — unbiased and UMVU for this case. |
References
- Bouman, C.A. Foundations of Computational Imaging, SIAM, 2022, §§2.2–2.3.1.
- Wikipedia contributors. "Estimation theory." Wikipedia, The Free Encyclopedia, 2019. (Read through the Maximum Likelihood section.)
- Scratchapixel. "Estimators." Mathematics & Physics for Computer Graphics. (Alternative reference with less mathematical notation.)