Linear Regression is a supervised type of Machine Learning algorithm. It helps for predicting the continuous value.

###### What is Regression ?

When your output is not a categorical (like yes/no or 0/1) neither is cluster (like c0, c1, etc.), but instead is a continuous value (like temperature, height, etc.) is a Regression. **Linear Regression** algorithm helps you to get the ‘Best Fit Line’ (by finding its equation).

Best Fit Line is the line that not exactly passes through all the data points but instead tries it’s best to approximate them. For categorical data we draw a line separating the data points into two classes. To select the best line we use **margin** as the metric and then tried to **maximize** it. Here in Linear Regression, we will use **Sum of Square Error (SSE)** as the metric and try to **minimize** it to get the ‘Best Fit Line’.

Let the equation of the line given in the above figure is:

**Y= mx + c**

Here *m *is the slope and *c *is the intercept of line. *x *and *y* variables are data point and cannot be changed. It is important to find the value of intercept and slope to get **Best Fit Line**. But there will be multiple lines based upon the value of slope(*m*) and intercept(*c*). So, here the linear regression will help to find **Best Fit Line**. Means it will return the line with least sum of square error.

Similarly it is applicable to more than two data points. This is called Multiple Linear regression. Remember, a linear regression in two dimensions is a straight line, in three dimensions it is a plane, and in more than three dimensions, it is a hyperplane.

**Y = c0 + m1c1 + m2c2 + m3c3 + … … mncn**

###### Regression Evaluation Metrics

Regression evaluation metrics is use to check the error cause by our model. The **metric** helps us to compare our current model with a constant baseline and tells us how much our model is better.

For regression algorithms, three evaluation metrics are commonly used:-

**Mean Absolute Error**(MAE) is the mean of the absolute value of the errors. It is calculated as:

**Mean Squared Error**(MSE) is the mean of the squared errors. It is calculated as:

**Root Mean Squared Error**(RMSE) is the square root of the mean of the squared errors:

###### Method to find Best Fit Line in Linear Regression

**Ordinary Least Square Method**

where, bar(x) and bar(y) are average/mean value of x and y respectively.

Ordinary Least Square method looks simple and computation is easy. But, OLS method will only work for a uni-variate datasets which contains single X variables and single Y variables. Multi-variate dateset contains a single Y variables set and multiple X variables sets, that means in case of** Multiple Linear Regression ** we required to use another method called “Gradient Descent”.

**Gradient Descent**

The main function of the gradient descent is to minimize the cost function. it is similar to find out a best direction to take a step downhill. Using Gradient descent algorithm, we will figure out a minimal cost function by applying various parameters for theta 0 and theta 1 and see the slope intercept until it reaches convergence.

To find the best minimum, repeat steps to apply various values for theta 0 and theta 1. In other words, repeat steps until convergence.

where alpha (a) is a learning rate / how big a step take to downhill.

There are three types of Gradient Descent algorithms:

**Batch Gradient Decent****–**Sum all training examples for each steps.**Stochastic Gradient Descent –**Use one training sample at each iteration instead of using whole dateset to sum all for every steps.**Mini-Batch Gradient Descent –**It is similar like SGD, it uses n samples instead of one at each iteration.