Derive Gradient Descent for a Univariate Linear Regression Model

I think, whoever starts to learn Machine Learning, checks out the Machine Learning Class by Andrew Ng. I was not an exception. But the mathematical equations and all the formulae made me realize how less I remember my +2 maths. I was stuck at each and every step. But again, it opened a whole new world of Machine Learning.

In one of the videos, there is a deduction of the Gradient Descent equation for the Linear Regression Model. It took me some time and help to figure that out and now I must write it down. So, this post is all about deriving the Gradient Descent Equation for the univariate Linear Regression model. A univariate linear regression model is represented by a straight line equation as below:

ŷ = θ0 + θ1x

The better fit of the straight line ensures a better prediction of data. The best fit of this straight line equation can be obtained by determining the optimum values of θ0 and θ1. These optimum values of θ0 and θ1 are derived using the Gradient Descent Equation. The mathematical representation of a univariate linear regression model is here.

So, let the fun begin step by step.

Step1:

The generalized Gradient Descent equation is as below:

θj = θj – α * ∂⁄∂θj J(θ1,θ2)

where j=0,1 represents the feature index number.
J(θ1,θ2) is the Cost Function.
α is a constant called learning rate

Step2:

J(θ1,θ2) = 1/2m∑(ŷ – yi)²)

where m = Total number of examples in the Training Dataset.

ŷ = the predicted value = θ0 + θ1x

yi = actual value of y for i

∑ = the sum for i = 1 to m

Now, let’s replace J(θ1,θ2) in the above equation:
θj = θj -α* ∂⁄∂θj (1/2m∑(ŷ – yi)²)

Step3:

The next step is to replace ŷ.
ŷ = θ0 + θ1x
θj = θj -α* ∂⁄∂θj (1/2m∑(θ0 + θ1xi- yi)²)

Step4:

Next comes the Derivative part and we will use the below Differentiation Rules.

Thus, the above equation is modified as below:
θj = θj -α/ 2m * ∑∂⁄∂θj (θ0 + θ1xi – yi)²

Step5:

Now, we have to apply the below Differentiation Rule:

θj = θj – α/ 2m * ∑ 2(θ0 + θ1xi – yi)²-1 * ∂⁄∂θj(θ0 + θ1xi – yi)
= θj – α/ m * ∑ (θ0 + θ1xi – yi) * ∂⁄∂θj(θ0 + θ1xi – yi)

Step6:

Now we can calculate θ0 and θ1 values.
Let’s start with θ0
θ0 = θ0 – α/ m * ∑(θ0 + θ1xi – yi)∂⁄∂θ0(θ0 + θ1xi – yi)

Since all the elements except θ0 in the above derivative are treated as constants with respect to θ0,

∂⁄∂θ0(θ0 + θ1xi – yi) = 1

Thus,
θ0 = θ0 – α/ m * ∑(θ0 + θ1xi – yi)
= θ0 – α/ m * ∑(ŷ – yi)

Now we can calculate θ1
θ1 = θ1 – α/ m * ∑(θ0 + θ1xi – yi)∂⁄∂θ1(θ0 + θ1xi – yi)

Since θ0 and y are constants with respect to θ1, the derivative equation is as below:
∂⁄∂θ1(θ0 + θ1xi – yi) = xi

Thus,
θ1 = θ1 – α/ m * ∑(θ0 + θ1xi – yi)* xi
= θ1 – α/ m * ∑(ŷ – yi) * xi

Now, we have derived both the values of θ0and θ1 as below:
θ0 = θ0 – α/ m * ∑(ŷ – yi)
θ1 = θ1 – α/ m * ∑(ŷ – yi) * xi

Thank You!