I think, whoever starts to learn Machine Learning, checks out the Machine Learning Class by Andrew Ng. I was not an exception. But the mathematical equations and all the formulae made me realize how less I remember my +2 maths. I was stuck at each and every step. But again, it opened a whole new world of Machine Learning.
In one of the videos, there is a deduction of the Gradient Descent equation for the Linear Regression Model. It took me some time and help to figure that out and now I must write it down. So, this post is all about deriving the Gradient Descent Equation for the univariate Linear Regression model. A univariate linear regression model is represented by a straight line equation as below:
ŷ = θ0 + θ1x
So, let the fun begin step by step.
Step1:
The generalized Gradient Descent equation is as below:
θj = θj – α * ∂⁄∂θj J(θ1,θ2)
where j=0,1 represents the feature index number.
J(θ1,θ2) is the Cost Function.
α is a constant called learning rate
Step2:
J(θ1,θ2) = 1/2m∑(ŷ – yi)²)
where m = Total number of examples in the Training Dataset.
ŷ = the predicted value = θ0 + θ1x
yi = actual value of y for i
∑ = the sum for i = 1 to m
Now, let’s replace J(θ1,θ2) in the above equation:
θj = θj -α* ∂⁄∂θj (1/2m∑(ŷ – yi)²)
Step3:
The next step is to replace ŷ.
ŷ = θ0 + θ1x
θj = θj -α* ∂⁄∂θj (1/2m∑(θ0 + θ1xi- yi)²)
Step4:
Next comes the Derivative part and we will use the below Differentiation Rules.
Thus, the above equation is modified as below:
θj = θj -α/ 2m * ∑∂⁄∂θj (θ0 + θ1xi – yi)²
Step5:
Now, we have to apply the below Differentiation Rule:
θj = θj – α/ 2m * ∑ 2(θ0 + θ1xi – yi)²-1 * ∂⁄∂θj(θ0 + θ1xi – yi)
= θj – α/ m * ∑ (θ0 + θ1xi – yi) * ∂⁄∂θj(θ0 + θ1xi – yi)
Step6:
Now we can calculate θ0 and θ1 values.
Let’s start with θ0
θ0 = θ0 – α/ m * ∑(θ0 + θ1xi – yi)∂⁄∂θ0(θ0 + θ1xi – yi)
Since all the elements except θ0 in the above derivative are treated as constants with respect to θ0,
∂⁄∂θ0(θ0 + θ1xi – yi) = 1
Thus,
θ0 = θ0 – α/ m * ∑(θ0 + θ1xi – yi)
= θ0 – α/ m * ∑(ŷ – yi)
Now we can calculate θ1
θ1 = θ1 – α/ m * ∑(θ0 + θ1xi – yi)∂⁄∂θ1(θ0 + θ1xi – yi)
Since θ0 and y are constants with respect to θ1, the derivative equation is as below:
∂⁄∂θ1(θ0 + θ1xi – yi) = xi
Thus,
θ1 = θ1 – α/ m * ∑(θ0 + θ1xi – yi)* xi
= θ1 – α/ m * ∑(ŷ – yi) * xi
Now, we have derived both the values of θ0and θ1 as below:
θ0 = θ0 – α/ m * ∑(ŷ – yi)
θ1 = θ1 – α/ m * ∑(ŷ – yi) * xi
Thank You!