Project

Project 5 — Gradient Descent Optimization

📘 Related: Lesson 29, Lesson 30 🛠 MATLAB required

🌍 Introduction

This project explores two gradient descent optimization algorithms used to minimize a two-dimensional function. It also introduces concepts related to the optimization of non-convex cost functions. These ideas tie into machine learning concepts presented later in the course.

Submission note: You are not required to complete a full write-up for this project. You only need to submit a published version of your code (in PDF format using the HW template). However, please include a paragraph that summarizes any conclusions about your experiment.

Part 1 — Gradient Descent on a Quadratic Function

1.1   Write Optimization UDFs

Write user-defined functions (UDFs) to minimize a 2D function using the gradient descent algorithm with a fixed step size, and the steepest descent algorithm. Similar to the homework, the UDF inputs should include function handles for the function and the gradient, an initial guess at the minimum, a stopping threshold, and a step size (for gradient descent only). Do not pass the true solution to your UDF.

The output should be a time-series of steps taken by the algorithm at each iteration, where the last step is the solution. In other words, the output should be an \(N \times (M+1)\) matrix where \(N\) is the dimension of the problem (2 in this case) and \(M\) is the number of steps taken. The \(+1\) accounts for the initial guess, which did not require a step.

1.2   Minimize a 2D Function

Use each algorithm to minimize the function

\[f = 4x_1^2 - 16x_1 + x_2^2 - 6x_2 + 25. \tag{1}\]

Use a starting point of \(x = (0, 0)\) for all algorithms. For the gradient descent algorithm, use a step size of \(0.1\).

1.3   Present Your Results

Plot Eq. (1) using MATLAB's contour feature. For each algorithm, overlay the steps at each iteration on top of the contour using MATLAB's quiver function (see the example code below). Additionally, on a single plot, show the true relative error vs. iteration for each algorithm using a log scale for the \(y\)-axis (i.e., use MATLAB's semilogy). Note that the actual solution is \(x = (2, 3)\).

Quiver plot example — the code below demonstrates how to overlay algorithm steps on a contour plot using quiver. The step coordinates in this example are pre-computed; your script will pass in the output matrix from your UDFs.
%% Example: Plot Algorithm Steps with Quiver
% Maj Casey Pellizzari, 17 Dec 2018
% Reference: Gilat & Subramaniam, Numerical Methods for Engineers and
%            Scientists, 3rd Ed.

clear; clc;

% Set up the 2D function for plotting
X1 = linspace(-2, 8, 1000);
[X1, X2] = meshgrid(X1);
F = 4*X1.^2 - 16*X1 + X2.^2 - 6*X2 + 25;

% Algorithm steps (normally the output of your UDF)
x = [0    2.2037  1.6088  2.0399  1.9235  2.0078  1.9850  2.0015  1.9971  2.0003  1.9994; ...
     0    0.8264  2.4133  2.5749  2.8852  2.9168  2.9776  2.9837  2.9956  2.9968  2.9991];

% Compute differences for quiver arrows
d1 = [diff(x(1,:)) 0];
d2 = [diff(x(2,:)) 0];

scale = 2;  % scale arrow length for visibility

figure(1); clf;
set(gcf, 'DefaultAxesFontSize', 12)
set(gcf, 'DefaultAxesFontWeight', 'bold')
contour(X1(1,:), X2(:,1), F, 10);
hold on
quiver(x(1,:), x(2,:), d1, d2, scale, 'r', 'LineWidth', 1, 'MarkerSize', 10)
axis([-2 8 -2 8])
axis image
legend('function', 'steps')
xlabel('x_1')
ylabel('x_2')
title('Plot of Algorithm Steps')

Part 2 — Non-Convex Optimization: The Griewank Function

The Griewank function is an example of a non-convex function. The second-order Griewank function is given by

\[f = 1 + \frac{1}{4000}x_1^2 + \frac{1}{4000}x_2^2 - \cos(x_1)\cos\!\left(\tfrac{1}{2}x_2\sqrt{2}\right). \tag{2}\]

While Eq. (2) has a global minimum at \(x = (0, 0)\), it also has many local minima that can trap gradient-based optimization algorithms.

  1. Provide a contour plot of the Griewank function over the interval \(-2 \le x_1 \le 8\) and \(-2 \le x_2 \le 8\).
  2. Using the gradient descent algorithm from Part 1, minimize Eq. (2) twice: once starting at \(x_0 = (1.6,\, 2)\) and again starting at \(x_0 = (1.6,\, 2.5)\). For both cases, plot the algorithm steps overlaid on a contour plot and plot the error vs. iteration using a log scale for the \(y\)-axis. You may show both error results on one plot.

Part 3 — Effect of Step Size

Using the same cost function from Part 2 (the Griewank function, Eq. (2)), use gradient descent to find the minimum with three different step sizes: \(a = 0.02,\ 0.5,\ 2\). In all three cases use a starting point of \(x_0 = (1.6,\, 2)\).

For each case, plot the algorithm steps overlaid on a contour plot. Also plot the error vs. iteration using a log scale for the \(y\)-axis.