Differentiating a single-variable function involves answering the question near a given point, how much does the value of the function change per unit change in the input? In the higher-dimensional setting, the question must be made more specific, since the change in output depends not only on how much the input is changed but also on the of the change in input.
Consider, for example, the function which returns the altitude of the point on earth with latitude and longitude .If the point identifies a point on a sloping hillside, then there are some directions in which increases, others in which decreases, and two directions in which neither increases nor decreases (these are the directions along the hill's contour lines, as you would see represented on a map).
Partial derivatives
The simplest directions for inquiring about the instantaneous rate of change of are those along the axes: The partial derivative of a function at a point is the slope of the graph of in the -direction at the point .In other words, it's the slope of the intersection of the graph of with the plane .The partial derivative may also be denoted .
Exercise Consider the function whose graph is shown. Determine the sign of () and the sign of ().
Solution.If we increase a little while holding constant, then decreases. Therefore, .If we increase a little while holding constant, then increases. Therefore, .
Graphically, the partial derivative with respect to at a point is equal to the slope of the trace of the graph in the "" plane passing through that point. Similarly, the partial derivative with respect to at a point is equal to the slope of the trace of the graph in the "" plane passing through that point.
We can partial-differentiate multiple times, and it turns out that the order in which we apply these partial differentiation operations doesn't matter. This fact is called Clairaut's theorem.
Exercise Consider the function .Show that differentiating with respect to and then with respect to gives the same result as differentiating with respect to and then with respect to .
Solution.The partial derivative of with respect to is , and the derivative of that with respect to is .The partial derivative of with respect to is , and the derivative of that with respect to is .Therefore, the conclusion of Clairaut's theorem is satisfied in this case.
Differentiability
A single-variable function is differentiable at a point if and only if its graph looks increasingly like that of a non-vertical line when zoomed increasingly far in. In other words, is differentiable if and only if there's a linear function such that goes to 0 as .
Likewise, a function of two variables is said to be differentiable at a point if its graph looks like a plane when you zoom in sufficiently around the point; that is, is differentiable at if
for some real numbers ,, and .If such a linear function exists, then its coefficients are necessarily ,, and .
So, the equation of the plane tangent to the graph of a differentiable function at the point is given by
This equation says how behaves for values of very close to :the output changes by the -change times 's sensitivity to changes in (namely ) plus the -change times 's sensitivity to changes in (namely ).
Gradient
Once we know how a differentiable function changes in the coordinate-axis directions, we can use the formula to succinctly express how it changes in any direction: we form the gradient of by putting all of the partial derivatives of a function together into a vector. Then, for any unit vector , the rate of change of in the direction is equal to .
Since, where is the angle between and , the direction of the gradient is the direction in which increases most rapidly. The direction opposite to the gradient is the direction of maximum decrease, and the directions orthogonal to these are the ones in which is constant.
Exercise Suppose that is a differentiable function at the point and that its instantaneous rates of change in the directions and are known. Show that if and are not parallel, then it is always possible to infer 's rates of change in the coordinate-axis directions.
Solution.The problem stipulates that we are given equations of the form
for some numbers .This system may be written in matrix form as
Since and are not parallel, they span .Therefore, the matrix is invertible, and the solution of is \begin{bmatrix} u_1 & u_2 \\\\ v_1 & v_2 \end{bmatrix}^{-1}\begin{bmatrix} c_1 \\\\ c_2 \end{bmatrix}.
Exercise Consider a differentiable function f from \mathbb{R}^2 to \mathbb{R} and a point where f is differentiable with nonzero gradient. The number of directions in which f increases maximally from that point is .The number of directions in which f decreases maximally from that point is .The number of directions in which f remains approximately constant is .
Solution.f increases maximally in the direction of its gradient and decreases maximally in the opposite direction. It remains approximately constant in the two directions orthogonal to its gradient.
Exercise Consider a differentiable function f from \mathbb{R}^3 to \mathbb{R} and a point where f is differentiable with nonzero gradient. The number of directions in which f increases maximally from that point is .The number of directions in which f decreases maximally from that point is .The number of directions in which f remains approximately constant is .
Solution.f increases maximally in the direction of its gradient and decreases maximally in the opposite direction. It remains approximately constant in the plane of directions orthogonal to its gradient. Since a plane contains infinitely many directions, the number of directions in which f remains approximately constant is infinite.
Second-order differentiation
We can take the notion of a gradient, which measures the linear change of a function, up a degree. The Hessian of a function f: \mathbb{R}^n \to \mathbb{R} is defined to be the matrix
The same is true at points \mathbf{a} other than the origin if we evaluate the gradient and Hessian at \mathbf{a} instead of \mathbf{0} and if we replace \mathbf{x} with \mathbf{x}-\mathbf{a}.
Exercise Suppose that a,b,c,d,e and f are real numbers and that f(x,y) = a + bx + cy + dx^2 + exy + fy^2.Show that the quadratic approximation of f at the origin is equal to f.
Solution.The gradient of f evaluated at the origin is [b, c], so the linear approximation of f is
\begin{align*}f(0,0) + f_x(0,0) x + f_y(0,0) = a + bx + cy.\end{align*}
The Hessian is \begin{bmatrix} 2d & e \\\ e & 2f \end{bmatrix}, so the quadratic terms in the quadratic approximation are
We can combine the ideas of quadratic approximation and diagonalization to gain sharp insight into the shape a function's graph at a point where the gradient is zero. Since the Hessian matrix H is by Clairaut's theorem, the spectral theorem implies that it is orthogonally diagonalizable.
With VDV' as the diagonalization of H, the quadratic term in the quadratic approximation becomes
Since the components of V'\mathbf{x} are the coordinates of \mathbf{x} with respect to the basis given by the columns \mathbf{v}_1, \ldots, \mathbf{v}_n of V, the quadratic term may be written as
where [\tilde{x}_1, \tilde{x}_2, \ldots, \tilde{x}_n] is the vector of coordinates of [x_1, x_2, \ldots, x_n] with respect to the basis given by the columns of V.
Writing the quadratic approximation of f in the form \frac{1}{2}(\lambda_1\tilde{x}_1^2 + \lambda_2\tilde{x}_2^2 + \cdots + \lambda_n\tilde{x}_n^2) is powerful because it presents the changes in f as a sum of n separate changes, each of which is as simple as the parabola y = ax^2.
a=${a}
If \lambda_1 is negative, then the graph of f is shaped like an up-turned parabola along the \mathbf{v}_1 axis. If it's positive, then the graph of f is shaped like a down-turned parabola along that axis.
Exercise Consider a point (x_1, \ldots x_n) where f has zero gradient and a Hessian with eigenvalues \lambda_1, \ldots, \lambda_n.
If all of the eigenvalues are positive, then f is at (x_1, \ldots x_n) than at nearby points.
If all of the eigenvalues are negative, then f is at (x_1, \ldots x_n) than at nearby points.
If some eigenvalues are positive and some are negative, then f increases as you move away from (x_1, \ldots x_n) in some directions and in other directions.
In addition to helping distinguish local minima, local maxima, and saddle points, the diagonalized Hessian can also help us recognize ravines in the graph of f.This idea arises in the context of numerical optimization methods for deep learning.
Exercise Suppose that f:\mathbb{R}^2 \to \mathbb{R} has zero gradient at a given point, and suppose that its Hessian matrix at that point has eigenvalues \lambda_1 and \lambda_2.How can you recognize based on the values of \lambda_1 and \lambda_2 whether the graph of f is ravine-shaped?
Solution.If \lambda_1 and \lambda_2 are both positive, with one close to zero and the other very large, then the graph of f will be ravine-shaped. That's because the steep increase in one direction corresponds to one of the eigenvalues being very large, and the shallow increase in the orthogonal direction is indicated by the other eigenvalue being very small.