Regression Algorithms and Examples

Linear Regression

It refers to a linear relationship between two or more variables. That means, if we draw this relationship in a two-dimensional space (between two variables), we get a straight line.
Y = 𝛉1 + 𝛉2X
where X is the explanatory variable 
           Y is the dependent variable. 
            The slope of the line is 𝛉2, and 𝛉1 is the intercept (the value of Y when X = 0).

Disadvantage:  

It assumes that a single straight line is appropriate as a summary of the data.

Example:

Ordinary Least Square (OLS)

Ordinary Least Squares (OLS) is one of the methods to find the best fit line for a dataset using linear regression. It calculates the difference between the predicted and the actual value, squares it, and repeats this step for all data points. Thus, a sum of all the errors is computed. And this sum is the overall representation of how accurate the model is. 

Next, the parameters of the model are tweaked such that this squared error is minimized until there is no scope for improvement.

Disadvantage:

  • Sensitivity to Outliers.
  • Data has to be normally distributed. 

Example:

OLS Model using Boston Housing Dataset

Polynomial Regression

For most of the datasets, the data cannot be summarized by a straight line. That results in Underfitting. Now, to overcome underfitting, we need to increase the complexity of the Model. So, instead of representing the data by a straight line, we can generate a higher-order equation by adding powers to the original features as new features.
Thus, Y = 𝛉1 + 𝛉2X can be transformed into 
           Y = 𝛉1 + 𝛉2X + 𝛉3X2


Disadvantage

  • The presence of one or two outliers in the data can seriously affect the results of the nonlinear analysis.
  • There are fewer model validation tools for the detection of outliers in the nonlinear regression model.

Example:

Gradient Descent Regression

Gradient Descent Regression Algorithm uses gradient descent to optimize the model.  Now, this Algorithm aims at minimizing the value of the cost function J(𝛉1,𝛉2). We will start with some value of  𝛉1 and 𝛉2 and keep on changing the values until we get the Minimum value of J(𝛉1,𝛉2) i.e. best fit for the line that passes through the data points.

Disadvantage:

If the learning rate for gradient descent is too fast, we are going to skip the true local minimum to optimize for time. If it is too slow, the gradient descent may never converge.

Example:

Gradient Descent Model in R

Decision Tree Regression

A Decision Tree is mostly used for a classification problem. But, Decision trees can also be applied to regression problems, using DecistionTreeRegressor  Class. 
The decision trees are used to fit a sine curve with addition to noisy observation. As a result, it learns local linear regressions approximating the sine curve.
We can see that if the maximum depth of the tree (controlled by the max_depth parameter) is set too high, the decision trees learn too fine details of the training data and learn from the noise, i.e. they overfit.

Disadvantage: 

Overfitting is one of the most practical difficulties for decision tree models. And this problem gets solved by the use of random forests.

Example: Decision Tree Regression Model using Boston Housing Dataset

Applications of Regression: 

Predict Stock prices, house prices, sales, etc.

to be continued…
0

Leave a Reply

Your email address will not be published. Required fields are marked *