Derivatives for Machine Learning
1. Why Do We Need Derivatives in Machine Learning?
In machine learning, our primary goal is to optimize a model by minimizing a loss function. This loss function measures how "wrong" our model's predictions are.
The Core Idea: Derivatives tell us the slope of the loss function. By knowing the slope, we can determine the direction to adjust our model's parameters (weights and biases) to reduce the loss. This iterative process is called Gradient Descent.
Optimization Loop

2. What is a Derivative?
A derivative measures the instantaneous rate of change of a function with respect to one of its variables — it is the slope of the tangent line at a specific point.
- Notation:
or - Limit Definition:
As the gap

The derivative at a point is the slope of the tangent line at that point.
3. Differentiability: When Can We Find a Derivative?
A function
Three common failure cases
| Discontinuity | Sharp Corner / Cusp | Vertical Tangent |
|---|---|---|
| The function has a jump or hole — no single tangent exists. | Slope changes abruptly — left-hand and right-hand slopes differ. | The tangent line is vertical — slope is infinite (undefined). |
![]() |
![]() |
![]() |
ML Note: This is why activation functions like ReLU (which has a corner at 0) require special handling.
4. First and Second-Order Derivatives
4.1 First-Order Derivative — Slope & Direction
Represents the slope of the tangent to the function at a point, indicating how rapidly the function is changing.
The first derivative
| Meaning | |
|---|---|
| Function is increasing | |
| Function is decreasing | |
| Critical point — Indicating local maxima, minima, or saddle points. |
ML Context: Gradient descent moves in the opposite direction of
to descend toward a minimum.
Imagine you are in a car driving down the highway.
If you look at your speedometer, it might say you are going exactly 60 miles per hour right at that exact split-second. That is exactly what a derivative is. It is a measurement of how fast something is changing right now. In math, if we have a graph showing how far you've traveled over time, the derivative tells us the exact steepness (or slope) of that graph at one specific point.
Simple definition: A derivative is your exact speed at a specific moment.
4.2 Second-Order Derivative — Curvature & Concavity
The second derivative
| Meaning | Shape | |
|---|---|---|
| Concave Up (convex) | Bowl |
|
| Concave Down (concave) | Hill |
|
| Concavity may be changing | Possible inflection point |
Imagine you are driving 60 miles per hour, and suddenly you stomp on the gas pedal to pass a truck. You feel yourself get pushed back into your seat. Your speed is changing. You are accelerating.
If the first derivative is your speed, the second derivative is your acceleration. It tells us how fast your speed is changing.
ML Context: Second-order methods (e.g., Newton's method) use
to take smarter steps but are expensive to compute at scale.
4.3 Concave vs. Convex Functions
Convex Function (bowl-shaped
- The graph lies below or on the chord joining any two points on the curve.
everywhere. - Has a single global minimum — ideal for optimization.
Concave Function (hill-shaped
- The graph lies above or on the chord joining any two points on the curve.
everywhere. - Has a single global maximum.
ML Context: Loss functions that are convex (e.g., MSE with linear models, logistic loss) guarantee that gradient descent will find the global minimum. Deep neural network loss surfaces are generally non-convex, making optimization harder.
5. Critical Points: Maxima, Minima, and Saddle Points
The goal of optimization is to find the minimum of the loss function. Critical points (where


| Type | Description | ||
|---|---|---|---|
| Local Minimum | Lowest point in a neighborhood | ||
| Local Maximum | Highest point in a neighborhood | ||
| Global Minimum | Absolute lowest over entire domain | ||
| Global Maximum | Absolute highest over entire domain | ||
| Saddle Point | Critical point that is neither min nor max | ||
| Inflection Point | Concavity changes sign | — |
ML Challenge: Gradient descent can get "stuck" in a local minimum or slow down near saddle points — both common in high-dimensional loss surfaces. Techniques like momentum, Adam, and learning rate schedules help escape these.
VI. Worked Example
Step 1 — Find the first and second derivatives
Step 2 — Find critical points (set )
Step 3 — Classify critical points using
| Critical Point | Classification | |
|---|---|---|
| Local Maximum | ||
| Local Minimum |
Step 4 — Increasing / Decreasing intervals
Using the sign of
| Interval | Behaviour | |
|---|---|---|
| Decreasing | ||
| Decreasing | ||
| Increasing |
Step 5 — Concavity and inflection point
Set
| Interval | Concavity | |
|---|---|---|
| Concave Down | ||
| Concave Up |
Inflection point at
VII. Summary: Why Derivatives are Essential for ML
| Concept | Role in ML |
|---|---|
| First Derivative | Gives direction & magnitude of slope — drives gradient descent |
| Gradient |
Vector of partial derivatives — tells us how to update all weights |
| Second Derivative | Reveals curvature — used in advanced optimizers (Newton's method) |
| Convexity | Convex loss surfaces guarantee a global minimum |
| Chain Rule | Powers backpropagation — makes deep network training feasible |
| Critical Points | Identify minima, maxima, and saddle points in the loss surface |
VIII. Question and Answers
-
A data scientist is analyzing a function f(x) and wants to determine if it is differentiable at a certain point. Which of the following conditions must be met for a function to be differentiable at a point?
- The function must be continuous at the point.
- The function must have a defined slope at the point.
- The limit of the difference quotient as x approaches the point must exist.
Explanation - Differentiability of a function at a point implies that the function has a defined slope at that point. The slope is given by the derivative of the function at that point.
- To determine if a function is differentiable at a certain point, we need to check if the function satisfies the following conditions:
- The function must be continuous at the point. This means that the value of the function at the point should be defined and the limit of the function as x approaches the point should exist and be equal to the value of thefunction at the point.
- The function must have a defined slope at the point. This means that the limit of the difference quotient as x approaches the point must exist. The difference quotient is given by (f(x) - f(a))/(x - a), where a is the point of interest.
- If both of the above conditions are met, then the function is differentiable at the point.
-
Consider the function
. Which of the following statements is true?
The function has a local minimum at x = 3.
Explanation- To find the minimum of the function, we take the derivative
and set it to zero. - Solving for x, we get
and .
- Solving for x, we get
- We then evaluate the second derivative
at each of these points to determine whether they correspond to a minimum, maximum, or saddle point. - At
, , which means that it is a local maximum. - At
, , which means that it is a local minimum.
- At
- To find the minimum of the function, we take the derivative
-
Given a function
, what is the critical point(s) of this function?
X=-1
Explanation
To find critical points, we calculate the derivative of the functionand solve for x such that . In this case, the derivative is . Setting this equal to zero and solving yields x=-1 as the critical point.


