HW 30 — Gradient Descent User-Defined Function
Background
Gradient descent is the foundation of virtually all modern machine learning algorithms, from logistic regression to deep neural networks. Before applying it to complex models in later lessons, you will implement it as a clean, reusable MATLAB UDF.
The update rule at each iteration is:
where \(\gamma\) is the step size and \(\nabla f(x^k)\) is the gradient evaluated at the current point. The algorithm stops when the change in position is smaller than a threshold \(\epsilon\):
In this assignment, you will use MATLAB's symbolic toolbox to compute the gradient analytically, convert it to an anonymous function, and then pass it to your GD UDF.
Preflight
For each of the following functions, use MATLAB's symbolic toolbox to define the variables, write an anonymous function for \(f\), compute the symbolic gradient, convert it to an anonymous function using matlabFunction, and evaluate both \(f\) and \(\nabla f\) at the origin. The final output values should be of type double, not symbolic.
- \(f_{2D}(x_1, x_2) = 2x_1 + x_2 + 2x_1 x_2 + x_1^2 + 7\)
- \(f_{3D}(x_1, x_2, x_3) = 3x_1 + 2x_2 + x_3 + 2x_1 x_2 + x_1 x_3 + 2x_2 x_3 + x_2^2/4 + x_3^2/4 - 2\)
Main Tasks
Part 1 — Gradient Descent UDF
Write a MATLAB UDF called gradient_descent with the following interface:
- Inputs: function handle
f, gradient handlegrad_f, initial guess vectorx0, step sizegamma, convergence thresholdepsilon - Outputs: minimum location
xmin, minimum valuefmin, number of iterationsn_iter
The UDF should use the stopping criterion \(\|x^{k+1} - x^k\|_2 \leq \epsilon\).
Part 2 — Apply to a 2D Function
Apply your gradient_descent UDF to minimize the following function:
Use the following parameters:
- Step size: \(\gamma = 0.02\)
- Threshold: \(\epsilon = 10^{-5}\)
- Initial guess: \(x^0 = [0;\; 0]\)
Report the minimum location and function value. Verify your answer analytically by setting \(\nabla f = 0\) and solving by hand.
Part 3 — Contour Plot
Generate a contour plot of \(f(x_1, x_2)\) over the range \(x_1, x_2 \in [-2,\; 8]\). Mark the minimum found by your GD UDF with an × symbol. Include axis labels and a title.
Hints
- Define \(f\) as an anonymous function of a 2-element column vector:
f = @(x) 4*x(1)^2 - 16*x(1) + x(2)^2 - 6*x(2) + 25; - The gradient of \(f\) is \(\nabla f(x) = [8x_1 - 16;\; 2x_2 - 6]\). Define it as an anonymous function as well.
- For the contour plot, use
meshgridto create a grid and evaluate \(f\) at each grid point, then usecontourforcontour. - In MATLAB, the L2 norm of a vector
visnorm(v)(which defaults to the 2-norm). - If your algorithm does not converge, try a smaller step size. If it converges very slowly, try a slightly larger one.