Homework Assignment

HW 34 — Linear Classification of GEO Space Objects

📘 Related: Lesson 34 📁 Data: sat_data.mat 🛠 MATLAB required

📋 Overview

The Combined Space Operations Center (CSpOC) tracks approximately 27,000 objects in Earth orbit. A key task is classifying radar and optical observations into one of four object types in geosynchronous (GEO) orbit:

LabelClass
1Active Payload
2Inactive Payload
3Rocket Body
4Debris

In this assignment you will build and train a multi-class linear classifier using two observational features — average visual magnitude and average rotation rate — from sat_data.mat. You will train the model by minimizing a softmax cross-entropy loss using MATLAB's fminsearch, then evaluate its performance on held-out test data.

📐 Model Structure

The input \(\mathbf{X}\) is a \(2 \times N\) matrix (features as rows, objects as columns). Multiplying by the weight matrix \(\mathbf{W}\) and adding bias \(\mathbf{b}\) produces a \(4 \times N\) matrix of logits \(\mathbf{Z}\). Applying softmax column-wise converts each column to a probability distribution over the four classes. The loss \(L\) is the average cross-entropy between the predicted probabilities and the true one-hot labels.

The full parameter set is \(\boldsymbol{\theta} = (\mathbf{W}, \mathbf{b})\):

\[ \mathbf{Z} = \mathbf{W}\mathbf{X} + \mathbf{b}, \qquad P_{kn} = \frac{e^{Z_{kn}}}{\sum_{j=1}^{4} e^{Z_{jn}}}, \qquad L = -\frac{1}{N}\sum_{n=1}^{N}\sum_{k=1}^{4} Y_{kn}\log P_{kn} \]

where \(\mathbf{Y}\) is the \(4 \times N\) matrix of one-hot true labels. \(\mathbf{W}\) has \(4 \times 2 = 8\) parameters and \(\mathbf{b}\) has 4, giving 12 total parameters.

Why softmax cross-entropy instead of L2? An L2 loss (\(\|\mathbf{Y} - \mathbf{Z}\|^2\)) treats the problem like regression on one-hot targets. Softmax cross-entropy is the correct probabilistic loss for multi-class classification: it interprets the logits as log-probabilities, normalizes them to sum to 1, and applies a logarithmic penalty that grows without bound as confidence in the wrong class increases.

Data Loading & Exploration

  1. Load sat_data.mat. It contains two variables:
    • Y — \(300 \times 2\) matrix: column 1 is average visual magnitude, column 2 is average rotation rate.
    • ID — \(300 \times 1\) integer vector of class labels (1–4).
    Display the size and class of both variables. Transpose Y to obtain X (\(2 \times 300\)), so features are in rows and objects are in columns.
  2. Plot a scatter plot of visual magnitude vs. rotation rate, color-coded by class. Label the four classes in the legend. Do the classes appear linearly separable?
  3. Convert the \(300 \times 1\) integer label vector ID into a \(4 \times 300\) one-hot matrix Y_oh. Column \(n\) should have a 1 in row ID(n) and zeros elsewhere (Figure 2). Verify the first few columns match the first few entries of ID.

    Figure 2: Example one-hot encoding for 6 objects with labels [4, 1, 3, 2, 4, 1].

    Class obj 1
    ID=4
    obj 2
    ID=1
    obj 3
    ID=3
    obj 4
    ID=2
    obj 5
    ID=4
    obj 6
    ID=1
    Row 1 (Active) 010001
    Row 2 (Inactive) 000100
    Row 3 (Rocket Body)001000
    Row 4 (Debris) 100010
  4. Use rng(356) and randperm to randomly shuffle the 300 objects. Take the first 80% as training data and the remaining 20% as test data. Normalize each feature to zero mean and unit variance using training statistics only, then apply the same transform to the test set. After normalizing, confirm the training set mean is 0 and standard deviation is 1 for each feature.

Loss Function

  1. The parameter vector \(\boldsymbol{\theta} \in \mathbb{R}^{12}\) packs all model parameters as: \[ \boldsymbol{\theta} = \begin{bmatrix} \mathbf{W}(:) \\ \mathbf{b} \end{bmatrix} \] where the first 8 elements are the entries of \(\mathbf{W}\) stored column-major (so reshape(theta(1:8), 4, 2) recovers \(\mathbf{W}\)) and the last 4 are \(\mathbf{b}\). The following local function is provided — copy it to the bottom of your script file. It accepts the \(12 \times 1\) parameter vector \(\boldsymbol{\theta}\), the \(N \times 2\) feature matrix \(\mathbf{X}\) (objects as rows), and the \(4 \times N\) one-hot label matrix \(\mathbf{Y}\), and returns a scalar loss computed according to the softmax cross-entropy loss function described in the Model Structure section above.
    function L = ce_loss(theta, X, Y_true)
    % theta  : 12x1 parameter vector [W(:); b]
    % X      : N x 2  (objects as rows, features as columns)
    % Y_true : 4 x N  one-hot label matrix
    
    N = size(X, 1);
    
    % Step 1: Unpack W (4x2) and b (4x1) from theta.
    % reshape() fills column-major, so the first 4 elements of theta become
    % column 1 of W (scores for feature 1), next 4 become column 2 (feature 2).
    W = reshape(theta(1:8), 4, 2);
    b = theta(9:12);
    
    % Step 2: Compute logit matrix Z (4 x N).
    % X' transposes X to 2xN so each column is one object's feature vector.
    % W*X' produces a 4xN matrix — each column holds 4 class scores for one object.
    % Adding b broadcasts the 4x1 bias across all N columns.
    Z = W * X' + b;
    
    % Step 3: Numerically stable softmax.
    % Subtracting the column maximum before exp() prevents overflow.
    % The shift cancels in the ratio (numerator and denominator both scale equally),
    % so the probabilities are unchanged. sum(E,1) sums down each column.
    E = exp(Z - max(Z, [], 1));
    P = E ./ sum(E, 1);               % 4 x N — each column sums to 1
    
    % Step 4: Average cross-entropy loss.
    % Y_true is one-hot, so Y_true.*log(P) selects only the log-probability
    % of the correct class for each object. We average over all N objects.
    % The 1e-10 prevents log(0) if any probability rounds to zero.
    L = -1/N * sum(sum(Y_true .* log(P + 1e-10)));
    end
  2. Create a function handle loss_fn that calls ce_loss with the training data fixed, so it depends only on theta. Initialize all parameters to zero (theta0, a 12×1 zero vector) and evaluate the initial loss. What value do you expect for a model that assigns equal probability to all four classes? Does your result agree?

Training with fminsearch

  1. Train the model by passing loss_fn and theta0 to MATLAB's fminsearch. Set MaxIter = 5000 and MaxFunEvals = 20000. Once fminsearch returns the optimal parameter vector theta_opt, recover \(\mathbf{W}_\text{opt}\) (4×2) and \(\mathbf{b}_\text{opt}\) (4×1) from theta_opt using the same unpacking scheme as the provided ce_loss function. Report the final training loss and compare it to the initial loss.
  2. Compute training set accuracy. Apply the trained model to the training features to get logit matrix \(\mathbf{Z}\), then take the max along the class dimension (dim 1) to find the predicted class index for each object. Do the same with the one-hot label matrix to recover the true class indices. Report training accuracy as a percentage.

    MATLAB hint: The class with the highest logit score is the predicted class. MATLAB's max with two output arguments returns both the maximum value and its index — here you only need the index. [~, y_pred] = max(Z, [], 1) returns a \(1{\times}N\) row vector of predicted class indices (1–4), where ~ discards the values. To recover the true class indices from the one-hot matrix, apply the same max trick, or use IDtrain directly (the integer label vector). Once you have both index vectors with matching orientations, mean(y_pred == y_true) * 100 gives accuracy as a percentage.

Evaluation & Analysis

  1. Now compute the test set accuracy and test cross-entropy loss. Report both values.
  2. Based on your results, should the CSpOC rely on this linear classifier to update the US space catalogue? What are the limitations of a linear model for this problem, and what would you recommend as a next step?