Project 5 — Gradient Descent Optimization
Introduction
This project explores two gradient descent optimization algorithms used to minimize a two-dimensional function. It also introduces concepts related to the optimization of non-convex cost functions. These ideas tie into machine learning concepts presented later in the course.
Part 1 — Gradient Descent on a Quadratic Function
1.1 Write Optimization UDFs
Write user-defined functions (UDFs) to minimize a 2D function using the gradient descent algorithm with a fixed step size, and the steepest descent algorithm. Similar to the homework, the UDF inputs should include function handles for the function and the gradient, an initial guess at the minimum, a stopping threshold, and a step size (for gradient descent only). Do not pass the true solution to your UDF.
The output should be a time-series of steps taken by the algorithm at each iteration, where the last step is the solution. In other words, the output should be an \(N \times (M+1)\) matrix where \(N\) is the dimension of the problem (2 in this case) and \(M\) is the number of steps taken. The \(+1\) accounts for the initial guess, which did not require a step.
1.2 Minimize a 2D Function
Use each algorithm to minimize the function
\[f = 4x_1^2 - 16x_1 + x_2^2 - 6x_2 + 25. \tag{1}\]Use a starting point of \(x = (0, 0)\) for all algorithms. For the gradient descent algorithm, use a step size of \(0.1\).
1.3 Present Your Results
Plot Eq. (1) using MATLAB's contour feature. For each algorithm, overlay the steps at each iteration on top of the contour using MATLAB's quiver function (see the example code below). Additionally, on a single plot, show the true relative error vs. iteration for each algorithm using a log scale for the \(y\)-axis (i.e., use MATLAB's semilogy). Note that the actual solution is \(x = (2, 3)\).
quiver. The step coordinates in this example are pre-computed; your script will pass in the output matrix from your UDFs.
%% Example: Plot Algorithm Steps with Quiver
% Maj Casey Pellizzari, 17 Dec 2018
% Reference: Gilat & Subramaniam, Numerical Methods for Engineers and
% Scientists, 3rd Ed.
clear; clc;
% Set up the 2D function for plotting
X1 = linspace(-2, 8, 1000);
[X1, X2] = meshgrid(X1);
F = 4*X1.^2 - 16*X1 + X2.^2 - 6*X2 + 25;
% Algorithm steps (normally the output of your UDF)
x = [0 2.2037 1.6088 2.0399 1.9235 2.0078 1.9850 2.0015 1.9971 2.0003 1.9994; ...
0 0.8264 2.4133 2.5749 2.8852 2.9168 2.9776 2.9837 2.9956 2.9968 2.9991];
% Compute differences for quiver arrows
d1 = [diff(x(1,:)) 0];
d2 = [diff(x(2,:)) 0];
scale = 2; % scale arrow length for visibility
figure(1); clf;
set(gcf, 'DefaultAxesFontSize', 12)
set(gcf, 'DefaultAxesFontWeight', 'bold')
contour(X1(1,:), X2(:,1), F, 10);
hold on
quiver(x(1,:), x(2,:), d1, d2, scale, 'r', 'LineWidth', 1, 'MarkerSize', 10)
axis([-2 8 -2 8])
axis image
legend('function', 'steps')
xlabel('x_1')
ylabel('x_2')
title('Plot of Algorithm Steps')
Part 2 — Non-Convex Optimization: The Griewank Function
The Griewank function is an example of a non-convex function. The second-order Griewank function is given by
\[f = 1 + \frac{1}{4000}x_1^2 + \frac{1}{4000}x_2^2 - \cos(x_1)\cos\!\left(\tfrac{1}{2}x_2\sqrt{2}\right). \tag{2}\]While Eq. (2) has a global minimum at \(x = (0, 0)\), it also has many local minima that can trap gradient-based optimization algorithms.
- Provide a contour plot of the Griewank function over the interval \(-2 \le x_1 \le 8\) and \(-2 \le x_2 \le 8\).
- Using the gradient descent algorithm from Part 1, minimize Eq. (2) twice: once starting at \(x_0 = (1.6,\, 2)\) and again starting at \(x_0 = (1.6,\, 2.5)\). For both cases, plot the algorithm steps overlaid on a contour plot and plot the error vs. iteration using a log scale for the \(y\)-axis. You may show both error results on one plot.
Part 3 — Effect of Step Size
Using the same cost function from Part 2 (the Griewank function, Eq. (2)), use gradient descent to find the minimum with three different step sizes: \(a = 0.02,\ 0.5,\ 2\). In all three cases use a starting point of \(x_0 = (1.6,\, 2)\).
For each case, plot the algorithm steps overlaid on a contour plot. Also plot the error vs. iteration using a log scale for the \(y\)-axis.