**Statistical Correlation and Regression**

**Correlation & Regression**

In this part, we discuss about **Correlation & Regression**, depicting relationship between variables and the statistical process of estimating such relationship, relating to :

- Correlation
- Measures of Correlation
- Correlation Co-efficient
- Rank Correlation
- Regression Analysis

**Statistical Correlation**

**Statistical Correlation** refers to relationship between two or more related variables in statistical experiment. Two variables are said to be correlated, if a change in the value of one variable brings a change in the value of another variable (e.g. Height & Weight, Price & Demand, Income & Savings).

**Types of Correlation**

**Positive or Negative Correlation:**As per direction of movement, Correlation may be classified as Positive or Negative**Positive Correlation :**When the values of two variables move in the same direction, Correlation is said to be positive.**Negative Correlation :**If the values of variables move in opposite directions, so that with an increase (or decrease) in the values of one variable, the value of the other variable decreases (or increase), Correlation is said to be negative.**Simple, partial or multiple Correlation:**As per number of variable factors, Correlation may be classified as Simple, Partial or Multiple Correlation.**Simple Correlation**: In simple Correlation, there are only two variables.**Multiple correlation**: In multiple correlation, the relationship between three or more factors are studied.**Partial Correlation**: In partial Correlation, more than two factors are involved but Correlation is studied only between two factors and other factors are assumed to be constant.**Linear or Nonlinear Correlation:**According to the nature of change in ratio of variables,Correlation may be classified as Linear or Non Linear**Linear Correlation**: If the ratio of change between the variables is uniform, Correlation between variable will be linear.**Non-Linear Correlation**:The Correlation is said to be non-linear or curvilinear if corresponding to a unit change in the value of one variable, the other variable change at a fluctuating rate.

**Measures of Statistical Correlation**

**Statistical Co-relation may be measured** (i.e the degree of relationship between the variables) by various methods. such as: 1. Scatter Diagram Method, 2. Karl Pearson’s method of Co-efficient of Correlation, 3. Spearman’s Rank Co-efficient of Correlation, 4. Co-efficient of concurrent deviation

**Degree of Correlation**

The following chart shows the relative degree of correlation

Degree of correlation | Positive | Negative |

Perfect Correlation | 1 | -1 |

Very high degree of Correlation | 0.09 or more | -0.09 or more |

Sufficient high degree of Correlation | 0.75 to 0.9 | -0.75 to -0.9 |

Moderate degree of Correlation | 0.6 to 0.75 | -0.6 to -0.75 |

Only the possibility of Correlation | 0.3 to 0.6 | -0.3 to -0.6 |

Possibility of no Correlation | Less than 0.3 | Less than -0.3 |

No Correlation | 0 | 0 |

**Scatter Diagram Method**

**Scatter Diagram **is a tool for analyzing relationships between two variables. One variable is plotted on the horizontal axis and the other is plotted on the vertical axis. The pattern of their intersecting points can graphically show relationship patterns

– For each pair of x and y values, a dot (or point) is put. Thus dots equal to the number of plotted observations .

– If these plotted dots (or points) show some trend, either upward or downward, then the two variables (x and y) are said to be correlated (otherwise not correlated).

– The relationship is expressed through ‘value of r’. The value of ‘r’ must be in between ± 1.

**Measure of Relationship**

-If the trend of the points is upward moving from lower left hand corner to upper right hand corner, then correlation is positive (r = + 1). (r is coefficient of correlation (Fig a)

-If movement is reverse i.e. dots move from upper left hand corner to lower right hand corner, then correlation is negative (r= – 1) (Fig b)

-If no trend is observed, it indicates the absence of correlation (r = 0), (Fig c)

**Karl Pearson’s method of Co-efficient of Correlation**

**Karl Pearson Method **is used to compute co-efficient of correlation, as well as extent of the correlation is expressed in several formulae of algebraic nature & in numerical terms

Pearson’s co-efficient of correlation is represented by ‘r’, which lie between ±1

**Assumptions**

Pearson’s Coefficient of Correlation is based upon some assumptions. 1. A large variety of independent causes are operating in the series, to give a normal distribution. 2. The forces so operating are related in a casual way. 3. The relationship between the two series is linear. ‘r’ can be computed in various ways, depending upon the choice of the user.

The Table shows the values of Co-efficient of Correlation revealing the respective Degree of Correlation

Co-efficient of Correlation | |

Results | Degree of Correlation |

± 1 | Perfect correlation |

± .90 or more | Very high degree of correlation |

± .75 and < ± .90 | Fairly high degree of correlation |

± .50 and < ± .75 | Moderate degree of correlation |

± .25 and < ± .50 | Low degree of correlation |

± .25 | Very low degree of correlation |

0 | No correlation |

**Co-efficient of Correlation Computation :** **Direct method**

**Coefficient of correlation method** is used where the size of the given values of the variables are small or all the values of the variables can be reduced to small size by change of their scale or origin.

Formula of coefficient of correlation : Direct Method

Where, x= Sum of variable x. y = Sum of variable y. xy = Sum of products of variable x and y. x^{2 }= Sum of squares of values of variable x. y^{2} = Sum of squares of values of variable y. N is the number of observation

**Ex: **Compute co-efficient of correlation for following data, using Pearson’s direct method.

Marks in English : | 1 | 2 | 3 | 4 | 5 |

Marks in Statistics: | 6 | 7 | 8 | 9 | 10 |

The following table shows the computation of values and theirs sums

Marks in English X | X^{2} | Marks in Statistics Y | Y^{2} | XY |

1 | 1 | 6 | 36 | 6 |

2 | 4 | 7 | 49 | 14 |

3 | 9 | 8 | 64 | 24 |

4 | 16 | 9 | 81 | 36 |

5 | 25 | 10 | 100 | 50 |

X =15 | X^{2} = 55 | Y = 40 | Y^{2} = 330 | XY = 130 |

=1

Result of co-efficient of correlation being + 1, it shows that correlation between the two variables is perfectly positive.

**Co-efficient of Correlation Computation :** **Assumed Mean**

**Coefficient of correlation** is preferred when it is not possible to get the arithmetic averages of both the variables in whole or round numbers. Under this method, the deviations of values of each of the variables are taken from an assumed average.

**Formula of coefficient of correlation : Using Assumed Mean**

Where, dx^{ }= Deviation of x from assumed mean (i.e x – x series assumed mean). dy^{ }= Deviation of y from assumed mean (i.e, y – y series assumed mean). dx = Sum of deviation of x series from its assumed mean. dy = Sum of deviation of y series from its assumed mean, dx^{2 }= Sum of squares of deviation of x series from assumed mean. dy^{2} = Sum of squares of deviation of y series from assumed mean. dxdy = Sum of the product of deviation of x and y series from assumed mean.

**Ex.** Compute Karl Pearson’s co-efficient of correlation taking 79 and 132 as the average for Rainfall and Rice production variable respectively , by short cut method.

Rainfall: | 61 | 68 | 79 | 59 | 69 | 96 | 89 | 78 |

Rice Prodn: | 108 | 123 | 136 | 107 | 112 | 156 | 137 | 125 |

The following table shows the computation of values and theirs sums

Rainfall | Dev. from ass. av. (79) dx | dx^{2} | Rice production | Dev. from ass. av. (132) dy | dy2 | dx dy |

61 68 79 59 69 96 89 78 | -18 -11 0 -20 -10 17 10 -1 | 324 121 0 400 100 289 100 1 | 108 123 136 107 112 156 137 125 | -24 -9 4 -25 -20 24 -7 -7 | 576 81 16 625 400 576 25 49 | 432 99 0 500 200 408 50 7 |

Total | – 33 | 1335 | N = 8 | – 52 | 2348 | 1696 |

As value of r is between 0 and 1, there is a high degree of positive correlation between the two variables.

**Spearman’s Rank Correlation**

**Spearman’s Rank Correlation** is a nonparametric measure of statistical dependence between two variables. It enables to identify whether two variables relate in a monotonic function (i.e., that when one number increases, so does the other, or vice versa).

**Co-efficient of Rank correlation computation process:**

- Assign ranks to various items of the two series (if it is not given)
- Find differences of the ranks (d)
- Square these differences (d
^{2})

Formula of Spearman’s Rank Correlation

, where n = number of pairs of observations.

The value of this co-efficient ranges between + 1 and – 1. -If r = +1, there is complete agreement in the order of ranks and the ranks are in the same direction. -If r = – 1, there is complete agreement in the order of ranks and they are in opposite directions.

If the difference of ranks in each pair is zero, them from the above formula, we get r = 1

**Ex.** Ten Students in a voice contest are ranks by three judges in the following order:

1^{st} Judge: | 1 | 6 | 5 | 10 | 3 | 2 | 4 | 9 | 7 | 8 |

2^{nd} Judge: | 3 | 5 | 8 | 4 | 7 | 10 | 2 | 1 | 6 | 9 |

3^{rd} Judge: | 6 | 4 | 9 | 8 | 1 | 2 | 3 | 10 | 5 | 7 |

Use the method of rank-correlation to judge which pair of judges have the nearest approach to common liking in voice.

Ranks given by | Differences (d) | Squares of differences (d^{2}) | ||||||

1 ^{st}Judge | 2 ^{nd}Judge | 3 ^{rd}Judge | _{(i)} | _{(ii)} | _{(iii)} | _{(i)} | _{(ii)} | _{(iii)} |

1 | 3 | 6 | -2 | -3 | -5 | 4 | 9 | 25 |

6 | 5 | 4 | 1 | 1 | 2 | 1 | 1 | 4 |

5 | 8 | 9 | -3 | -1 | -4 | 9 | 1 | 16 |

10 | 4 | 8 | 6 | -4 | 2 | 36 | 16 | 4 |

3 | 7 | 1 | -4 | 6 | 2 | 16 | 36 | 4 |

2 | 10 | 2 | -8 | 8 | 0 | 64 | 64 | 0 |

4 | 2 | 3 | 2 | -1 | 1 | 4 | 1 | 1 |

9 | 1 | 10 | 8 | -9 | -1 | 64 | 81 | 1 |

7 | 6 | 5 | 1 | 1 | 2 | 1 | 1 | 4 |

8 | 9 | 7 | -1 | 2 | 1 | 1 | 4 | 1 |

Total | 200 | 214 | 60 |

The rank co relations are computed as follows :

r_{12 }= 1st & 2nd Judge | ||

r_{23 }= 2nd & 3rd Judge | = 1 – 1.297 = -0.297. | |

r_{13 }= 1st & 3rd Judge | 1– .364 = + 0.636. |

**Rank Correlation – Problems**

**Rank Correlation computation where Actual Ranks are not given.**

Compute rank co-efficient of correlation for marks obtained by 8 students in Mathematics and History papers.

Marks in Mathematics: | 15 | 20 | 28 | 12 | 40 | 60 | 20 | 80 |

Marks in History: | 40 | 30 | 50 | 30 | 20 | 10 | 30 | 60 |

The Table shows the computation details

Marks in Mathematics (X) | rank | Marks in History (Y) | rank | Difference d | d^{2} |

15 | 2 | 40 | 6 | -4 | 16 |

20 | 3.5 | 30 | 4 | -.5 | .25 |

28 | 5 | 50 | 7 | -2 | 4 |

12 | 1 | 30 | 4 | -3 | 9 |

40 | 6 | 20 | 2 | 4 | 16 |

60 | 7 | 10 | 1 | 6 | 36 |

20 | 3.5 | 30 | 4 | -.5 | .25 |

80 | 8 | 60 | 8 | 0 | 0 |

Total: | – | – | – | – | 81.50 |

The value of co relation zero indicates there is no correlation.

For equal ranks, some adjustment in the above formula is required.

Add (m^{3} – m) with d^{2} where m=number of items whose ranks are common.

Here,

The item 20 is repeated 2 times in X-series, i.e., m = 2 in X-series and 30 is repeated 3 times in Y series. So m = 3 in Y-series.

=1 – [6x{(81.5 + .5+2)} / 504]=1-[(6×84)/504)] =1- () = 1-1=0

**Rank Correlation – Problems**

**Compute rank correlation coefficient between the following two series X and Y**

X: | 68 | 64 | 70 | 60 | 54 | 67 | 76 | 63 |

Y: | 87 | 71 | 63 | 78 | 84 | 58 | 50 | 40 |

Computation Table

Rank of X | Rank of Y | d = X – Y | d^{2} |

3 | 1 | 2 | 4 |

5 | 4 | 1 | 1 |

2 | 5 | -3 | 9 |

7 | 3 | 4 | 16 |

8 | 2 | 6 | 36 |

4 | 6 | -2 | 4 |

1 | 7 | -6 | 36 |

6 | 8 | 2 | 4 |

n = 8 | d^{2} = 110 |

Rank correlation coefficient

The correlation between X and Y is negative.

**Concurrent Deviations**

**Concurrent Deviation** is a very simple and causal method of finding correlation, when the magnitude of the two variables is not relevant.

Concurrent deviations method involves in attaching a positive sign for a x-value (except the first), if this value is more than the previous value, and assigning a negative value if this value is less than the previous value. This is done for the y-series as well. The deviation in the x-value and the corresponding y-value is known to be concurrent if both the deviations have the same sign.

Denoting the number of concurrent deviation by c and total number of deviations as m (which must be one less than the number of pairs of x and y values), the coefficient of concurrent deviation is given by

If (2c-m) >0, then we take the positive sign both inside and outside the radical sign.

If (2c-m) <0, we consider the negative sign both inside and outside the radical sign.

Like Pearson’s correlation coefficient and Spearman’s rank correlation coefficient, the coefficient of concurrent deviations also lies between – 1 and 1, both inclusive.

**Ex.** Find the coefficient of concurrent deviations from the following data.

Year: | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 |

Price | 24 | 27 | 29 | 22 | 34 | 37 | 38 | 41 |

Demand: | 34 | 33 | 29 | 34 | 28 | 27 | 25 | 22 |

Computation of Coefficient of Concurrent Deviations

Year | Price | Sign of dev from prev fig (a) | Demand | Sign of dev from prev fig (b) | Product of deviation (ab) |

2000 | 24 | 34 | |||

2001 | 27 | + | 33 | – | – |

2002 | 29 | + | 34 | + | + |

2003 | 22 | – | 29 | – | + |

2004 | 34 | + | 8 | – | – |

2005 | 37 | + | 27 | – | – |

2006 | 38 | + | 25 | – | – |

2007 | 41 | + | 22 | – | – |

Here, m = number of pairs of deviations = 7, c = No. of positive signs in the product of deviation column = No. of concurrent deviation = 2

=-.65 [Since =-|,

We take negative sign both inside and outside of the radical sign]

Thus there is a negative correlation between price and demand.

**Regression Analysis**

* Regression Analysis* is a statistical process for estimating the relationships among variables.

**Regression Analysis Types**

**Simple and Multiple :**To describe relationship between variables**Simple**: To find relationship between 2 variables only**Multiple**: To find relationship between multiple variables**Linear and Non- linear**: To find relationship by plotting the values on graph.**Linear**: A straight line depicts a linear relationship**Non Linear**: A curved line depicts non linear relationship**Total and Partial**: To study the effect of multiple variables on one another**Total**: To study the effect of all the Important variables on one another**Partial**: To study the effect of one or two Important relevant variable making others as constant.

**Regression Analysis Methods**

**Graphical Method**: The observation data (x,y values) are plotted as point and then joined to get the regression analysis line.**Algebraic Method**: Linear equations are developed from the observation data**Normal Equation Method**: The line of the best fit (e.g Y on X) is obtained from simple linear algebraic equation**Deviation from Actual Means**: Two regression equations are developed in a modified form from the deviation figures

**Simple and Multiple Regression Analysis**

**Simple Regression Analysis**

A simple regression analysis is one which is confined to only two variables (e.g, Price and Demand). The value of one variable is estimated on the basis of the value of the other variable.

The variable whose values are estimated is called dependent, regressed or explained variable and the variable used as the basis of finding the value of the other variable is called the independent, regressing or explanatory variable.

The functional relationship between two variables X & Y can be expressed as

Y= f(X).

**Ex:** If the expenditure on sales promotion can have some effect on the volume of sales, then sales promotion will be the independent variable and sales will be the dependent variable. Here Sales is denoted by Y and Sales Promotion is denoted by X

**Multiple Regression Analysis**

The relationship is made among more than two related variables at a time say, X,Y, Z (like Sales, Price and income of the people).

In such analysis, the value of one variable is estimated on the basis of the other remaining variables. One variable is made dependent and the other variables independent.

The functional relationship is expressed as

Y = f(X,Z) or X = f(Y,Z) or Z = f(X,Y)

**Linear and Non- linear Regression Analysis**

Regression Analysis may also be classified as** Linear and Non- linear Regression Analysis.**

**Linear Regression Analysis**

A linear regression analysis is one, which gives rise to a straight line when the data relating to the two variables are plotted on a graph paper.

In simplest term, The linear relationship is mathematically represented by the equation of a straight line

Y = a + bX

A model is linear when each term is either a constant or the product of a parameter and a predictor variable. A linear equation is constructed by adding the results for each term,

Expressed by basic form:

Response = constant + parameter * predictor + … + parameter * predictor

Y = b_{o} + b_{1}X_{1} + b_{2}X_{2} + … + b_{k}X_{k}

If two variables have linear relationship with each other, a change in the value of the independent variable by one unit causes a constant change in the values of the dependent variable.

Linear regression analysis enables to study the average change in the value of the dependent variable for any given value of the independent variable.

The linear relationship is preferred due its simplicity and better prediction.

**Non-linear Regression Analysis**

While a linear equation has one basic form, nonlinear equations can take many different forms. If the equation doesn’t meet the criteria for a linear equation, it’s nonlinear. Unlike linear regression, these functions can have more than one parameter per predictor variable.

A non-linear regression analysis, graphically depicts a curved line when the data relating to variables are plotted on a graph paper. The regression will be a function involving the terms of higher order like, Y =X^{2}, Y = X^{3} etc.

**Total and Partial Regression Analysis**

Regression Analysis may also be classified as **Total and Partial Regression Analysis.**

**Total Regression Analysis**

A total regression analysis is made to study the effect of all Important variables on one another.

Ex. Effect of sales promotion expenditure, individual income, and price of the goods on the volume of sales are measured, it is a case of total regression analysis.

Regression equation takes the following forms like that of a multiple regression analysis:

S = f(A, I,P), X = f(Y,Z,P) etc.

Total regression analysis is usually made in the field of business and economics where values of a variable are effected by multiplicity of causes.

**Partial Regression Analysis**

In case of multiplicity of variables, effect of all Important variables on one another is considered in Total Regression Analysis, while Partial Regression analysis is made to study the effect of one or two relevant variable on another variable (keeping other variables constant).

The equation of such regression takes the following from

Y = f(X but not of Z and P);

S = f(sales promotion but not of price and individual income).

**Graphical Method of Regression Analysis**

**Regression Analysis may be graphically** represented through a scatter diagram, drawn by plotting every observation by a dot. The dependent variables are shown on y-axis and independent variables on x-axis.

The dots are connected to draw regression lines, depicting the best mean value of one variables corresponding to the mean values of the other.

The line of best fit in the scatter diagram is used to summarise the data.

**Ex. **Using the scatter diagram method draw the two regression lines associated with the following data both separately and jointly:

X: | 80 | 100 | 120 | 80 | 40 | 100 | 140 | 100 | 110 |

Y: | 60 | 60 | 100 | 70 | 60 | 80 | 100 | 80 | 70 |

**Algebraic Method of Regression Analysis**

**Regression Analysis may be algebraically** represented through Normal Equation Method.

**Normal Equation Method**

The line of the best fit for Y on X (i.e. the regression line Y on X)is obtained by finding the values of Y for any two (preferably the extreme ones) values of X through the linear equation Y = a + bx,

Where, a and b are the two constants, whose values are to be found out by solving two normal equations Y = Na + bX & XY = aX + bX^{2, }where, X and Y represent the given values of the X and Y variables respectively.

Line of the best fit for X on Y (i.e. the regression line of X on Y) through the linear equation X = a + bY

where, the values of the two constants a and b are determined by solving the two normal equations X = Na + bY & XY = aY + bY^{2}

**Ex. **Compute rank correlation coefficient between the two series X and Y

X | 16 | 21 | 26 | 23 | 28 | 24 | 17 | 22 | 21 |

Y | 33 | 38 | 50 | 39 | 52 | 47 | 35 | 43 | 41 |

Computation Table

x | y | x^{2} | y^{2} | xy |

16 | 33 | 256 | 1089 | 528 |

21 | 38 | 441 | 1444 | 798 |

26 | 50 | 676 | 2500 | 1300 |

23 | 39 | 529 | 1521 | 897 |

28 | 52 | 784 | 2704 | 1456 |

24 | 47 | 576 | 2209 | 1128 |

17 | 35 | 289 | 1225 | 595 |

22 | 43 | 484 | 1889 | 946 |

21 | 41 | 441 | 1681 | 861 |

=198 | 378 | 4476 | 16222 | 8509 |

Regression equation of x on y : (x = a + by)

x = Na + by … (i)

xy = ay + by^{2 } … (ii)

Putting the values in (i), we get

198 = 9a + 378b … (iii)

Putting the values in (ii), we get,

8509 = 378a + 16222b … (iv)

So, 74844 = 3402a + 142884b … (v) [multiplying (iii) by 378]

and, 76581 = 3402a + 145998b … (vi) [multiplying (iv) by 9]

So, – 1737 = – 3114b (vii) … [(vi) – (v)], or b= = .56

Putting the value of b in (i), we get 198 = 9a + 378 x(.56), or 198 = 9a + 211.68

Or 9a = -13.68, or a= = -1.52

Regression equation of x on y : (x = a + by)

or x = -1.52 + .56y, or x = .56y -1.52

Regression equation of y on x: (y = a + bx)

y = Na + bx … (i)

xy = ax + bx^{2} … (ii)

Putting the values in (i), we get,

378 = 9a + 198b … (iii)

Putting the values in (ii), we get,

8509 = 198a + 4476b … (iv)

So, 74844 = 1782a + 39204b .. (v) [(iii) x 198]

and 76581 = 1782a + 40284b … (vi) [ (iv) x 9]

So, 1737 = 1080b [(vi) – (v)], or b= =1.61

Putting Value of b in (iii), we get 378 = 9a + (198 x 1.61), or 9a= 378 – 318.78, or 9a=58.22, or a=6.58

Regression equation of y on x: y = a + bx, or y = 1.61x + 6.58

**Deviation from Actual Means**

**Deviation from Actual Means** is computed using two regression equations (X on Y and Y on X), developed in a modified form from the deviation figures of the two variables from their respective actual Means, rather than their actual values.

**Regression equation of X on Y : **X = + b_{xy} ( Y – ) or X – = b_{xy} ( Y – )

**Regression equation of Y on X : **Y = + b_{yx} ( X – ) or Y – = b_{yx} ( X – )

where, X= given value of variable, Y= given value of variable, = arithmetic average of variable X, = arithmetic average of variable Y, r is correlation co-efficient

b_{xy} = regression coefficient of X on Y = r_{x}/_{y}

**Ex. : Regression Analysis – Deviation from Actual Means**

Using the method of deviations from the actual Means, find: 1. the two regressions equations,

2. The correlation coefficient, 3. The most probable value of Y when X = 30

X: | 25 | 28 | 35 | 32 | 31 | 36 | 29 | 38 | 34 | 32 |

Y: | 43 | 46 | 49 | 41 | 36 | 32 | 31 | 30 | 33 | 39 |

**Computation Table**

X | Y | (X – 32) x | (Y – 38) y | X^{2} | Y^{2} | xy |

25 | 43 | -7 | 5 | 49 | 25 | -35 |

28 | 46 | -4 | 8 | 16 | 64 | -32 |

35 | 49 | 3 | 11 | 9 | 121 | 33 |

32 | 41 | 0 | 3 | 0 | 9 | 0 |

31 | 36 | -1 | -2 | 1 | 4 | 2 |

36 | 32 | 4 | -6 | 16 | 36 | -24 |

29 | 31 | -3 | -7 | 9 | 49 | 21 |

38 | 30 | 6 | -8 | 36 | 64 | -48 |

34 | 33 | 2 | -5 | 4 | 25 | -10 |

32 | 39 | 0 | 1 | 0 | 1 | 0 |

X = 320 | Y = 380 | x = 0 | y = 0 | x^{2} = 140 | y^{2} = 398 | xy = -93 |

**Regression equation of X on Y**

+r_{x}/_{y}

Putting the values, we get the value of & as follows

Putting the values, we get the value of _{x }& _{y}, as follows

_{x}= appx

_{y}= appx

Putting the values, we get the value of r, as follows

r=/n_{x}_{y}

Putting the respective values, we get the Regression equation of X on Y**, **as :

X= 32+ (-.394) x [{()} x (Y-38)] = 32 + [-0.2337x (Y – 38)]

= 32 + 8.8806 – 0.2337Y = 40.8806 – 0.2337Y

So, the Regression equation of X on Y is : X= 40.8806 – 0.2337Y

**Regression equation of Y on X**

Y=+r_{x}& _{y}

Or, Y=38 – 0.6643 (X – 32) = 38 + 21.2576 – 0.6643X = 59.2576 – 06643X

So, the Regression equation of Y on X is Y= 59.2576 – 06643X

**Coefficient of Regression**

**Coefficient of Regression** determines the value by which one variable increases for a unit increase in other variable.

Coefficient of regression of X on_{ }Y_{ = }b_{xy}=r_{x }\_{y}

Coefficient of regression of Y on_{ }X_{ = }b_{yx} =r_{y} \_{x}

Where, r = co-efficient of correlation, s_{x }= standard deviation of series x, s_{y }= standard deviation of series y

The co-efficient of Regression is also given by

Where, X = the given value of X variable, Y = the given value of Y variable, N = number of pairs of observation. All other factors carry the same meanings as given above

Ex. Find Coefficient of Regression of X on Y and of Y on X

Sale Promotion Exp X (Thousands) | 11 | 8 | 9 | 5 | 8 | 9 | 20 |

Sales Y: (Lacs) | 10 | 8 | 6 | 5 | 9 | 7 | 11 |

Computation Details

X | Y | x(X-10) | y(Y-8) | x^{2} | y^{2} | xy |

11 | 10 | 1 | 2 | 1 | 4 | 2 |

8 | 8 | -2 | 0 | 4 | 0 | 0 |

9 | 6 | -1 | -2 | 1 | 4 | 2 |

5 | 5 | -5 | -3 | 25 | 9 | 15 |

8 | 9 | -2 | 1 | 4 | 1 | -2 |

9 | 7 | -1 | -1 | 1 | 1 | 1 |

20 | 11 | 10 | 3 | 100 | 9 | 30 |

()70 | 56 | 0 | 0 | 136 | 28 | 48 |

So,,

Regression Co-efficient of X on Y = b_{xy} = = 1.71

Regression Co-efficient of Y on X = b_{yx} = =.353

Click here to see **PDF**