Thursday, 1 June 2017

Regression in machine learning

In this post first i am going to explain Regression and next post i will give you code of Simple Linear Regression.

Regression: Regression is a statistical process for estimating the relationship among variables.

Regression is divided into two categories as shown in below figure.

Linear Regression: In Statistics , Linear regression is an approach for modelling the relationships between a scalar dependent variable (say Y) and one or more explanatory variables.
Linear regression is further divided into two categories Simple Linear Regression and Multiple Linear Regression.

Simple Linear Regression: In Simple Linear Regression there is one dependent variable and one independent variable.
The formula for simple linear regression is given below:

Y = β0+β1.X

Where Y is dependent variable
β0 is bias or Y-intercept
β1 is slope
X is independent variable

To find β1:      β1 = ∑(Xi-X̄)(Yi-Ȳ)/∑(Xi-X̄)2

To find β0:      β0 = Ȳ-β1.X̄

Example: Lets take example as shown in below table;

Investment(X) Revenue(Y)
10 20
15 22
20 35
25 40

Above table shows example of Investment and Revenue suppose if we Invest 10Lac our Revenue will be 20Lac.
Now by above given formula we want to find if we Invest 30Lac what will be our Revenue?

First we will find mean(x) and mean(y):

mean(X) = X̄ = 17.5                    There is Xbar but bar is not shown properly in the screen

mean(Y) = Ȳ = 29.25                    There is Ybar but bar is not shown properly in the screen

X Y (Xi-X̄) (Yi-Ȳ) (Xi-X̄)(Yi-Ȳ) (Xi-X̄)2
10 20 -7.5 -9.25 69.375 56.25
15 22 -2.5 -7.25 18.125 6.25
20 35 2.5 5.75 14.375 6.25
25 40 7.5 10.75 80.625 56.25

∑(Xi-X̄)(Yi-Ȳ) = 182.5

∑(Xi-X̄)2 = 125

β1 = ∑(Xi-X̄)(Yi-Ȳ)/∑(Xi-X̄)2
β1 = 182.5/125
β1 = 1.46

β0 = Ȳ-β1.X̄
β0 = 29.25 - [(1.46)*(17.5)]
β0 = 3.7

Y = (3.7) + (1.46)(X)
Y = 47.5
For investment of 30Lac the revenue will be 47.5Lac
I will give you a code of simple linear regression and its explanation in next post.

Multiple Linear Regression: In Multiple Linear Regression there is one dependent variable and more than one independent variable.
The formula for multiple linear regression is given below:

Y = β0+β1.X2+β2.X2+β3.X3+...........+βn.Xn

Where Y is dependent variable
β0 is bias or Y-intercept
β1,β2,β3,.....,βn is slope
X1,X2,X3,.....,Xn is independent variables

Logistic Regression: Logistic Regression is a regression model where the dependent variable is categorical. This covers the case of binary two value 0 and 1 , which represents outcome such as pass/fail , loss/win, healthy/sick etc.

The formula for logistic regression for single independent variable is given below:
P = e(β0+β1.X)  / [1+e(β0+β1.X)]

The formula for logistic regression for multiple independent variable is given below:
P = e(β0+β1.X2+β2.X2+.........+βn.Xn)  / [1+e(β0+β1.X2+β2.X2+.........+βn.Xn)]