Statistical-Correlation-Regression -

Last Updated on: 14th December 2023, 11:33 am

Statistical Correlation and Regression

Correlation & Regression

In this part, we discuss about Correlation & Regression, depicting relationship between variables and the statistical process of estimating such relationship, relating to :

Correlation
Measures of Correlation
Correlation Co-efficient
Rank Correlation
Regression Analysis

Statistical Correlation

Statistical Correlation refers to relationship between two or more related variables in statistical experiment. Two variables are said to be correlated, if a change in the value of one variable brings a change in the value of another variable (e.g. Height & Weight, Price & Demand, Income & Savings).

Types of Correlation

Positive or Negative Correlation: As per direction of movement, Correlation may be classified as Positive or Negative
Positive Correlation : When the values of two variables move in the same direction, Correlation is said to be positive.
Negative Correlation : If the values of variables move in opposite directions, so that with an increase (or decrease) in the values of one variable, the value of the other variable decreases (or increase), Correlation is said to be negative.
Simple, partial or multiple Correlation: As per number of variable factors, Correlation may be classified as Simple, Partial or Multiple Correlation.
Simple Correlation : In simple Correlation, there are only two variables.
Multiple correlation : In multiple correlation, the relationship between three or more factors are studied.
Partial Correlation : In partial Correlation, more than two factors are involved but Correlation is studied only between two factors and other factors are assumed to be constant.
Linear or Nonlinear Correlation: According to the nature of change in ratio of variables,Correlation may be classified as Linear or Non Linear
Linear Correlation: If the ratio of change between the variables is uniform, Correlation between variable will be linear.
Non-Linear Correlation :The Correlation is said to be non-linear or curvilinear if corresponding to a unit change in the value of one variable, the other variable change at a fluctuating rate.

Measures of Statistical Correlation

Statistical Co-relation may be measured (i.e the degree of relationship between the variables) by various methods. such as: 1. Scatter Diagram Method, 2. Karl Pearson’s method of Co-efficient of Correlation, 3. Spearman’s Rank Co-efficient of Correlation, 4. Co-efficient of concurrent deviation

Degree of Correlation

The following chart shows the relative degree of correlation

Degree of correlation	Positive	Negative
Perfect Correlation	1	-1
Very high degree of Correlation	0.09 or more	-0.09 or more
Sufficient high degree of Correlation	0.75 to 0.9	-0.75 to -0.9
Moderate degree of Correlation	0.6 to 0.75	-0.6 to -0.75
Only the possibility of Correlation	0.3 to 0.6	-0.3 to -0.6
Possibility of no Correlation	Less than 0.3	Less than -0.3
No Correlation	0	0

Scatter Diagram Method

Scatter Diagram is a tool for analyzing relationships between two variables. One variable is plotted on the horizontal axis and the other is plotted on the vertical axis. The pattern of their intersecting points can graphically show relationship patterns

– For each pair of x and y values, a dot (or point) is put. Thus dots equal to the number of plotted observations .

– If these plotted dots (or points) show some trend, either upward or downward, then the two variables (x and y) are said to be correlated (otherwise not correlated).

– The relationship is expressed through ‘value of r’. The value of ‘r’ must be in between ± 1.

Measure of Relationship

-If the trend of the points is upward moving from lower left hand corner to upper right hand corner, then correlation is positive (r = + 1). (r is coefficient of correlation (Fig a)

-If movement is reverse i.e. dots move from upper left hand corner to lower right hand corner, then correlation is negative (r= – 1) (Fig b)

-If no trend is observed, it indicates the absence of correlation (r = 0), (Fig c)

Karl Pearson’s method of Co-efficient of Correlation

Karl Pearson Method is used to compute co-efficient of correlation, as well as extent of the correlation is expressed in several formulae of algebraic nature & in numerical terms

Pearson’s co-efficient of correlation is represented by ‘r’, which lie between ±1

Assumptions

Pearson’s Coefficient of Correlation is based upon some assumptions. 1. A large variety of independent causes are operating in the series, to give a normal distribution. 2. The forces so operating are related in a casual way. 3. The relationship between the two series is linear. ‘r’ can be computed in various ways, depending upon the choice of the user.

The Table shows the values of Co-efficient of Correlation revealing the respective Degree of Correlation

Co-efficient of Correlation Computation : Direct method

Coefficient of correlation method is used where the size of the given values of the variables are small or all the values of the variables can be reduced to small size by change of their scale or origin.

Formula of coefficient of correlation : Direct Method

$\displaystyle r=\frac{{N\sum{{xy}}-\sum{x}\sum{y}}}{{\sqrt{{N\sum{{{{x}^{2}}}}-{{{\left( {\sum{x}} \right)}}^{2}}.}}\sqrt{{N\sum{{{{y}^{2}}}}-{{{\left( {\sum{y}} \right)}}^{2}}}}}}$

Where, $\displaystyle \sum{{}}$ x= Sum of variable x. $\displaystyle \sum{{}}$ y = Sum of variable y. $\displaystyle \sum{{}}$ xy = Sum of products of variable x and y. $\displaystyle \sum{{}}$ x²= Sum of squares of values of variable x. $\displaystyle \sum{{}}$ y² = Sum of squares of values of variable y. N is the number of observation

Ex: Compute co-efficient of correlation for following data, using Pearson’s direct method.

Marks in English :	1	2	3	4	5
Marks in Statistics:	6	7	8	9	10

The following table shows the computation of values and theirs sums

$\displaystyle r=\frac{{N\sum{{xy}}-\sum{x}\sum{y}}}{{\sqrt{{N\sum{{{{x}^{2}}}}-{{{\left( {\sum{x}} \right)}}^{2}}.}}\sqrt{{N\sum{{{{y}^{2}}}}-{{{\left( {\sum{y}} \right)}}^{2}}}}}}=$ $\displaystyle \frac{{5(130)-15\times 40}}{{\sqrt{{5\times 55-{{{(15)}}^{2}}}}.\sqrt{{5\times 330-{{{(40)}}^{2}}}}}}=$ $\displaystyle \frac{{650-600}}{{\sqrt{{275-225}}.\sqrt{{1650-1600}}}}=\frac{{50}}{{\sqrt{{50\times 50}}}}=\frac{{50}}{{50}}$ =1

Result of co-efficient of correlation being + 1, it shows that correlation between the two variables is perfectly positive.

Co-efficient of Correlation Computation : Assumed Mean

Coefficient of correlation is preferred when it is not possible to get the arithmetic averages of both the variables in whole or round numbers. Under this method, the deviations of values of each of the variables are taken from an assumed average.

Formula of coefficient of correlation : Using Assumed Mean

$\displaystyle r=\frac{{N\sum{{dxdy}}-\left[ {\left( {\sum{{dx}}} \right)\times \left( {\sum{{dy}}} \right)} \right]}}{{\sqrt{{N\sum{{d{{x}^{2}}}}-\mathop{{\left( {\sum{{dx}}} \right)}}^{2}.}}\sqrt{{N\sum{{d{{y}^{2}}}}-\mathop{{\left( {\sum{{dy}}} \right)}}^{2}}}}}=$

Where, dx= Deviation of x from assumed mean (i.e x – x series assumed mean). dy= Deviation of y from assumed mean (i.e, y – y series assumed mean). $\displaystyle \sum{{}}$ dx = Sum of deviation of x series from its assumed mean. $\displaystyle \sum{{}}$ dy = Sum of deviation of y series from its assumed mean, $\displaystyle \sum{{}}$ dx²= Sum of squares of deviation of x series from assumed mean. $\displaystyle \sum{{}}$ dy² = Sum of squares of deviation of y series from assumed mean. $\displaystyle \sum{{}}$ dxdy = Sum of the product of deviation of x and y series from assumed mean.

Ex. Compute Karl Pearson’s co-efficient of correlation taking 79 and 132 as the average for Rainfall and Rice production variable respectively , by short cut method.

Rainfall:	61	68	79	59	69	96	89	78
Rice Prodn:	108	123	136	107	112	156	137	125

The following table shows the computation of values and theirs sums

Rainfall	Dev. from ass. av. (79) dx	dx²	Rice production	Dev. from ass. av. (132) dy	dy2	dx dy
61 68 79 59 69 96 89 78	-18 -11 0 -20 -10 17 10 -1	324 121 0 400 100 289 100 1	108 123 136 107 112 156 137 125	-24 -9 4 -25 -20 24 -7 -7	576 81 16 625 400 576 25 49	432 99 0 500 200 408 50 7
Total	– 33	1335	N = 8	– 52	2348	1696

$\displaystyle r=\frac{{N\sum{{dxdy}}-\left[ {\left( {\sum{{dx}}} \right)\times \left( {\sum{{dy}}} \right)} \right]}}{{\sqrt{{N\sum{{d{{x}^{2}}}}-{{{\left( {\sum{{dx}}} \right)}}^{2}}}}.\sqrt{{N\sum{{d{{y}^{2}}}}-{{{\left( {\sum{{dy}}} \right)}}^{2}}}}}}=$ $\displaystyle \frac{{8\times 1696-\left( {-33\times -52} \right)}}{{\sqrt{{\left( {8\times 1335} \right)-{{{\left( {-33} \right)}}^{2}}}}.\sqrt{{\left( {8\times 2348} \right)-{{{\left( {-52} \right)}}^{2}}}}}}$ $\displaystyle =\frac{{13568-1716}}{{\sqrt{{\left( {10680-1089} \right)\times \left( {18789-2704} \right)}}}}=\frac{{11852}}{{\sqrt{{9591\times 16080}}}}$ $\displaystyle =\frac{{\left( {11852} \right)}}{{\left( {12418.66} \right)}}=095$

As value of r is between 0 and 1, there is a high degree of positive correlation between the two variables.

Spearman’s Rank Correlation

Spearman’s Rank Correlation is a nonparametric measure of statistical dependence between two variables. It enables to identify whether two variables relate in a monotonic function (i.e., that when one number increases, so does the other, or vice versa).

Co-efficient of Rank correlation computation process:

Assign ranks to various items of the two series (if it is not given)
Find differences of the ranks (d)
Square these differences (d²)

Formula of Spearman’s Rank Correlation

$\displaystyle r=1-\frac{{6\left( {\sum{{{{d}^{2}}}}} \right)}}{{{{n}^{3}}-n}}$

, where n = number of pairs of observations.

The value of this co-efficient ranges between + 1 and – 1. -If r = +1, there is complete agreement in the order of ranks and the ranks are in the same direction. -If r = – 1, there is complete agreement in the order of ranks and they are in opposite directions.

If the difference of ranks in each pair is zero, them from the above formula, we get r = 1

Ex. Ten Students in a voice contest are ranks by three judges in the following order:

1^st Judge:	1	6	5	10	3	2	4	9	7	8
2^nd Judge:	3	5	8	4	7	10	2	1	6	9
3^rd Judge:	6	4	9	8	1	2	3	10	5	7

Use the method of rank-correlation to judge which pair of judges have the nearest approach to common liking in voice.

The rank co relations are computed as follows :

\displaystyle 1-\frac{{6\left( {\sum{{{{d}^{2}}}}} \right)}}{{{{n}^{3}}-n}}

\displaystyle 1-\frac{{6\left( {\sum{{{{d}^{2}}}}} \right)}}{{{{n}^{3}}-n}}=1-\frac{{6.200}}{{{{{10}}^{3}}-10}}=1-\frac{{1200}}{{1000-10}}

Rank Correlation – Problems

Rank Correlation computation where Actual Ranks are not given.

Compute rank co-efficient of correlation for marks obtained by 8 students in Mathematics and History papers.

Marks in Mathematics:	15	20	28	12	40	60	20	80
Marks in History:	40	30	50	30	20	10	30	60

The Table shows the computation details

Marks in Mathematics (X)	rank	Marks in History (Y)	rank	Difference d	d²
15	2	40	6	-4	16
20	3.5	30	4	-.5	.25
28	5	50	7	-2	4
12	1	30	4	-3	9
40	6	20	2	4	16
60	7	10	1	6	36
20	3.5	30	4	-.5	.25
80	8	60	8	0	0
Total:	–	–	–	–	81.50

The value of co relation zero indicates there is no correlation.

For equal ranks, some adjustment in the above formula is required.

Add $\displaystyle \frac{1}{2}$ (m³ – m) with $\displaystyle \sum{{}}$ d² where m=number of items whose ranks are common.

Here,

The item 20 is repeated 2 times in X-series, i.e., m = 2 in X-series and 30 is repeated 3 times in Y series. So m = 3 in Y-series.

=1 – [6x{(81.5 + .5+2)} / 504]=1-[(6×84)/504)] =1- ( $\displaystyle \frac{{504}}{{504}}$ ) = 1-1=0

Rank Correlation – Problems

Compute rank correlation coefficient between the following two series X and Y

X:	68	64	70	60	54	67	76	63
Y:	87	71	63	78	84	58	50	40

Computation Table

Rank correlation coefficient

$\displaystyle r=1-\frac{{6\sum{{{{d}^{2}}}}}}{{n\left( {{{n}^{2}}-1} \right)}}=1-\frac{{6\times 110}}{{8\times 63}}$

The correlation between X and Y is negative.

Concurrent Deviations

Concurrent Deviation is a very simple and causal method of finding correlation, when the magnitude of the two variables is not relevant.

Concurrent deviations method involves in attaching a positive sign for a x-value (except the first), if this value is more than the previous value, and assigning a negative value if this value is less than the previous value. This is done for the y-series as well. The deviation in the x-value and the corresponding y-value is known to be concurrent if both the deviations have the same sign.

Denoting the number of concurrent deviation by c and total number of deviations as m (which must be one less than the number of pairs of x and y values), the coefficient of concurrent deviation is given by

$\displaystyle {{r}_{c}}=\pm \sqrt{{\pm \frac{{\left( {2c-m} \right)}}{m}}}$

If (2c-m) >0, then we take the positive sign both inside and outside the radical sign.

If (2c-m) <0, we consider the negative sign both inside and outside the radical sign.

Like Pearson’s correlation coefficient and Spearman’s rank correlation coefficient, the coefficient of concurrent deviations also lies between – 1 and 1, both inclusive.

Ex. Find the coefficient of concurrent deviations from the following data.

Year:	2000	2001	2002	2003	2004	2005	2006	2007
Price	24	27	29	22	34	37	38	41
Demand:	34	33	29	34	28	27	25	22

Computation of Coefficient of Concurrent Deviations

Year	Price	Sign of dev from prev fig (a)	Demand	Sign of dev from prev fig (b)	Product of deviation (ab)
2000	24		34
2001	27	+	33	–	–
2002	29	+	34	+	+
2003	22	–	29	–	+
2004	34	+	8	–	–
2005	37	+	27	–	–
2006	38	+	25	–	–
2007	41	+	22	–	–

Here, m = number of pairs of deviations = 7, c = No. of positive signs in the product of deviation column = No. of concurrent deviation = 2

$\displaystyle {{r}_{c}}=\pm \sqrt{{\pm \frac{{\left( {2c-m} \right)}}{m}}}$ $\displaystyle =\pm \sqrt{{\pm \frac{{\left( {4-7} \right)}}{m}}}$ $\displaystyle =\pm \sqrt{{\pm \frac{{\left( {-3} \right)}}{7}}}$ $\displaystyle =-\sqrt{{\frac{3}{7}}}$

=-.65 [Since $\displaystyle \frac{{\left( {2c-m} \right)}}{m}$ =- $\displaystyle \frac{3}{7}$ |,

We take negative sign both inside and outside of the radical sign]

Thus there is a negative correlation between price and demand.

Regression Analysis

Regression Analysis is a statistical process for estimating the relationships among variables.

Regression Analysis Types

Simple and Multiple : To describe relationship between variables
Simple : To find relationship between 2 variables only
Multiple : To find relationship between multiple variables
Linear and Non- linear : To find relationship by plotting the values on graph.
Linear : A straight line depicts a linear relationship
Non Linear : A curved line depicts non linear relationship
Total and Partial : To study the effect of multiple variables on one another
Total : To study the effect of all the important variables on one another
Partial : To study the effect of one or two important relevant variable making others as constant.

Regression Analysis Methods

Graphical Method : The observation data (x,y values) are plotted as point and then joined to get the regression analysis line.
Algebraic Method : Linear equations are developed from the observation data
Normal Equation Method : The line of the best fit (e.g Y on X) is obtained from simple linear algebraic equation
Deviation from Actual Means : Two regression equations are developed in a modified form from the deviation figures

Simple and Multiple Regression Analysis

Simple Regression Analysis

A simple regression analysis is one which is confined to only two variables (e.g, Price and Demand). The value of one variable is estimated on the basis of the value of the other variable.

The variable whose values are estimated is called dependent, regressed or explained variable and the variable used as the basis of finding the value of the other variable is called the independent, regressing or explanatory variable.

The functional relationship between two variables X & Y can be expressed as

Y= f(X).

Ex: If the expenditure on sales promotion can have some effect on the volume of sales, then sales promotion will be the independent variable and sales will be the dependent variable. Here Sales is denoted by Y and Sales Promotion is denoted by X

Multiple Regression Analysis

The relationship is made among more than two related variables at a time say, X,Y, Z (like Sales, Price and income of the people).

In such analysis, the value of one variable is estimated on the basis of the other remaining variables. One variable is made dependent and the other variables independent.

The functional relationship is expressed as

Y = f(X,Z) or X = f(Y,Z) or Z = f(X,Y)

Linear and Non- linear Regression Analysis

Regression Analysis may also be classified as Linear and Non- linear Regression Analysis.

Linear Regression Analysis

A linear regression analysis is one, which gives rise to a straight line when the data relating to the two variables are plotted on a graph paper.

In simplest term, The linear relationship is mathematically represented by the equation of a straight line

Y = a + bX

A model is linear when each term is either a constant or the product of a parameter and a predictor variable. A linear equation is constructed by adding the results for each term,

Expressed by basic form:

Response = constant + parameter * predictor + … + parameter * predictor

Y = b_o + b₁X₁ + b₂X₂ + … + b_kX_k

If two variables have linear relationship with each other, a change in the value of the independent variable by one unit causes a constant change in the values of the dependent variable.

Linear regression analysis enables to study the average change in the value of the dependent variable for any given value of the independent variable.

The linear relationship is preferred due its simplicity and better prediction.

Non-linear Regression Analysis

While a linear equation has one basic form, nonlinear equations can take many different forms. If the equation doesn’t meet the criteria for a linear equation, it’s nonlinear. Unlike linear regression, these functions can have more than one parameter per predictor variable.

A non-linear regression analysis, graphically depicts a curved line when the data relating to variables are plotted on a graph paper. The regression will be a function involving the terms of higher order like, Y =X², Y = X³ etc.

Total and Partial Regression Analysis

Regression Analysis may also be classified as Total and Partial Regression Analysis.

Total Regression Analysis

A total regression analysis is made to study the effect of all important variables on one another.

Ex. Effect of sales promotion expenditure, individual income, and price of the goods on the volume of sales are measured, it is a case of total regression analysis.

Regression equation takes the following forms like that of a multiple regression analysis:

S = f(A, I,P), X = f(Y,Z,P) etc.

Total regression analysis is usually made in the field of business and economics where values of a variable are effected by multiplicity of causes.

Partial Regression Analysis

In case of multiplicity of variables, effect of all important variables on one another is considered in Total Regression Analysis, while Partial Regression analysis is made to study the effect of one or two relevant variable on another variable (keeping other variables constant).

The equation of such regression takes the following from

Y = f(X but not of Z and P);

S = f(sales promotion but not of price and individual income).

Graphical Method of Regression Analysis

Regression Analysis may be graphically represented through a scatter diagram, drawn by plotting every observation by a dot. The dependent variables are shown on y-axis and independent variables on x-axis.

The dots are connected to draw regression lines, depicting the best mean value of one variables corresponding to the mean values of the other.

The line of best fit in the scatter diagram is used to summarise the data.

Ex. Using the scatter diagram method draw the two regression lines associated with the following data both separately and jointly:

X:	80	100	120	80	40	100	140	100	110
Y:	60	60	100	70	60	80	100	80	70

Algebraic Method of Regression Analysis

Regression Analysis may be algebraically represented through Normal Equation Method.

Normal Equation Method

The line of the best fit for Y on X (i.e. the regression line Y on X)is obtained by finding the values of Y for any two (preferably the extreme ones) values of X through the linear equation Y = a + bx,

Where, a and b are the two constants, whose values are to be found out by solving two normal equations $\displaystyle {\sum{{}}}$ Y = Na + b $\displaystyle {\sum{{}}}$ X & $\displaystyle {\sum{{}}}$ XY = a $\displaystyle {\sum{{}}}$ X + b $\displaystyle {\sum{{}}}$ X^2,where, X and Y represent the given values of the X and Y variables respectively.

Line of the best fit for X on Y (i.e. the regression line of X on Y) through the linear equation X = a + bY

where, the values of the two constants a and b are determined by solving the two normal equations $\displaystyle {\sum{{}}}$ X = Na + b $\displaystyle {\sum{{}}}$ Y & $\displaystyle {\sum{{}}}$ XY = a $\displaystyle {\sum{{}}}$ Y + b $\displaystyle {\sum{{}}}$ Y²

Ex. Compute rank correlation coefficient between the two series X and Y

X	16	21	26	23	28	24	17	22	21
Y	33	38	50	39	52	47	35	43	41

Computation Table

Regression equation of x on y : (x = a + by)

$\displaystyle {\sum{{}}}$ x = Na + b $\displaystyle {\sum{{}}}$ y … (i)

$\displaystyle {\sum{{}}}$ xy = a $\displaystyle {\sum{{}}}$ y + b $\displaystyle {\sum{{}}}$ y² … (ii)

Putting the values in (i), we get

198 = 9a + 378b … (iii)

Putting the values in (ii), we get,

8509 = 378a + 16222b … (iv)

So, 74844 = 3402a + 142884b … (v) [multiplying (iii) by 378]

and, 76581 = 3402a + 145998b … (vi) [multiplying (iv) by 9]

So, – 1737 = – 3114b (vii) … [(vi) – (v)], or b= $\displaystyle \frac{{1737}}{{3114}}$ = .56

Putting the value of b in (i), we get 198 = 9a + 378 x(.56), or 198 = 9a + 211.68

Or 9a = -13.68, or a= $\displaystyle -\frac{{13.68}}{9}$ = -1.52

Regression equation of x on y : (x = a + by)

or x = -1.52 + .56y, or x = .56y -1.52

Regression equation of y on x: (y = a + bx)

$\displaystyle {\sum{{}}}$ y = Na + b $\displaystyle {\sum{{}}}$ x … (i)

$\displaystyle {\sum{{}}}$ xy = a $\displaystyle {\sum{{}}}$ x + b $\displaystyle {\sum{{}}}$ x² … (ii)

Putting the values in (i), we get,

378 = 9a + 198b … (iii)

Putting the values in (ii), we get,

8509 = 198a + 4476b … (iv)

So, 74844 = 1782a + 39204b .. (v) [(iii) x 198]

and 76581 = 1782a + 40284b … (vi) [ (iv) x 9]

So, 1737 = 1080b [(vi) – (v)], or b= $\displaystyle \frac{{1737}}{{1080}}$ =1.61

Putting Value of b in (iii), we get 378 = 9a + (198 x 1.61), or 9a= 378 – 318.78, or 9a=58.22, or a=6.58

Regression equation of y on x: y = a + bx, or y = 1.61x + 6.58

Deviation from Actual Means

Deviation from Actual Means is computed using two regression equations (X on Y and Y on X), developed in a modified form from the deviation figures of the two variables from their respective actual Means, rather than their actual values.

Regression equation of X on Y : X = $\displaystyle \overline{X}$ + b_xy ( Y – $\displaystyle \overline{Y}$ ) or X – $\displaystyle \overline{X}$ = b_xy ( Y – $\displaystyle \overline{Y}$ )

Regression equation of Y on X : Y = $\displaystyle \overline{Y}$ + b_yx ( X – $\displaystyle \overline{X}$ ) or Y – $\displaystyle \overline{Y}$ = b_yx ( X – $\displaystyle \overline{X}$ )

where, X= given value of variable, Y= given value of variable, $\displaystyle \overline{X}$ = arithmetic average of variable X, $\displaystyle \overline{Y}$ = arithmetic average of variable Y, r is correlation co-efficient

b_xy = regression coefficient of X on Y = r $\displaystyle \sigma$ _x/ $\displaystyle \sigma$ _y

Ex. : Regression Analysis – Deviation from Actual Means

Using the method of deviations from the actual Means, find: 1. the two regressions equations,

2. The correlation coefficient, 3. The most probable value of Y when X = 30

X:	25	28	35	32	31	36	29	38	34	32
Y:	43	46	49	41	36	32	31	30	33	39

Computation Table

Regression equation of X on Y

$\displaystyle X=\overline{X}$ +r $\displaystyle \sigma$ _x/ $\displaystyle \sigma$ _y $\displaystyle \left( {Y-\overline{Y}} \right)$

Putting the values, we get the value of $\displaystyle \overline{X}$ & $\displaystyle \overline{Y}$ as follows

$\displaystyle \overline{X}=\frac{{\sum{X}}}{N}=\frac{{320}}{{10}}=32$

$\displaystyle \overline{Y}=\frac{{\sum{Y}}}{N}=\frac{{380}}{{10}}=38$

Putting the values, we get the value of $\displaystyle \sigma$ _x& $\displaystyle \sigma$ _y, as follows

$\displaystyle \sigma$ _x= $\displaystyle \sqrt{{\frac{{\sum{{{{x}^{2}}}}}}{N}}}=\sqrt{{\frac{{140}}{{10}}}}=3.74$ appx

$\displaystyle \sigma$ _y= $\displaystyle \sqrt{{\frac{{\sum{{{{y}^{2}}}}}}{N}}}=\sqrt{{\frac{{398}}{{10}}}}=6.31$ appx

Putting the values, we get the value of r, as follows

r= $\displaystyle {\sum{{Xy}}}$ /n $\displaystyle \sigma$ _x $\displaystyle \sigma$ _y $\displaystyle =\frac{{-93}}{{10\times 3.74\times 6.31}}=\frac{{-93}}{{10\times 23.5994}}=\frac{{-93}}{{235.99}}=-0.394$

Putting the respective values, we get the Regression equation of X on Y, as :

X= 32+ (-.394) x [{( $\displaystyle \frac{{3.74}}{{6.31}}$ )} x (Y-38)] = 32 + [-0.2337x (Y – 38)]

= 32 + 8.8806 – 0.2337Y = 40.8806 – 0.2337Y

So, the Regression equation of X on Y is : X= 40.8806 – 0.2337Y

Regression equation of Y on X

$\displaystyle Y=38+0394\times \frac{{6.31}}{{3.74}}\left( {x-32} \right)$

Or, Y=38 – 0.6643 (X – 32) = 38 + 21.2576 – 0.6643X = 59.2576 – 06643X

So, the Regression equation of Y on X is Y= 59.2576 – 06643X

Coefficient of Regression

Coefficient of Regression determines the value by which one variable increases for a unit increase in other variable.

Coefficient of regression of X onY=b_xy=

Coefficient of regression of Y onX=b_yx =

Where, r = co-efficient of correlation, s_x= standard deviation of series x, s_y= standard deviation of series y

The co-efficient of Regression is also given by

$\displaystyle {{b}_{{xy}}}=\frac{{N\sum{{XY-\sum{{X.\sum{Y}}}}}}}{{N\sum{{{{Y}^{2}}-{{{\left( {\sum{Y}} \right)}}^{2}}}}}}$

$\displaystyle {{b}_{{yx}}}=\frac{{N\sum{{XY-\sum{{X.\sum{Y}}}}}}}{{N\sum{{{{X}^{2}}-{{{\left( {\sum{X}} \right)}}^{2}}}}}}$

Where, X = the given value of X variable, Y = the given value of Y variable, N = number of pairs of observation. All other factors carry the same meanings as given above

Ex. Find Coefficient of Regression of X on Y and of Y on X

Sale Promotion Exp X (Thousands)	11	8	9	5	8	9	20
Sales Y: (Lacs)	10	8	6	5	9	7	11

Computation Details

So, $\displaystyle \overline{X}=\frac{{\sum{X}}}{N}=\frac{{70}}{7}=10$ , $\displaystyle \overline{Y}=\frac{{\sum{Y}}}{N}=\frac{{56}}{7}=8$

Regression Co-efficient of X on Y = b_xy = $\displaystyle \frac{{\sum{{xy}}}}{{\sum{{{{y}^{2}}}}}}=\frac{{48}}{{28}}$ = 1.71

Regression Co-efficient of Y on X = b_yx = $\displaystyle \frac{{\sum{{xy}}}}{{\sum{{{{x}^{2}}}}}}=\frac{{48}}{{136}}$ =.353

Click here to see PDF

Co-efficient of Correlation
Results	Degree of Correlation
± 1	Perfect correlation
± .90 or more	Very high degree of correlation
$\displaystyle \ge$ ± .75 and < ± .90	Fairly high degree of correlation
$\displaystyle \ge$ ± .50 and < ± .75	Moderate degree of correlation
$\displaystyle \ge$ ± .25 and < ± .50	Low degree of correlation
$\displaystyle \ge$ ± .25	Very low degree of correlation
0	No correlation

r₁₂= $\displaystyle 1-\frac{{6\left( {\sum{{{{d}^{2}}}}} \right)}}{{{{n}^{3}}-n}}$ 1st & 2nd Judge	$\displaystyle 1-\frac{{6\left( {\sum{{{{d}^{2}}}}} \right)}}{{{{n}^{3}}-n}}=1-\frac{{6.200}}{{{{{10}}^{3}}-10}}=1-\frac{{1200}}{{1000-10}}$	$\displaystyle =1-\frac{{1200}}{{990}}=1-1.213=-0.213$
r₂₃= $\displaystyle 1-\frac{{6\left( {\sum{{{{d}^{2}}}}} \right)}}{{{{n}^{3}}-n}}$ 2nd & 3rd Judge	$\displaystyle 1-\frac{{6.214}}{{{{{10}}^{3}}-10}}=1-\frac{{1284}}{{990}}$	= 1 – 1.297 = -0.297.
r₁₃= $\displaystyle 1-\frac{{6\left( {\sum{{{{d}^{2}}}}} \right)}}{{{{n}^{3}}-n}}$ 1st & 3rd Judge	$\displaystyle =1-\frac{{6.60}}{{1000-10}}=1-\frac{{360}}{{990}}$	1– .364 = + 0.636.

x	y	x²	y²	xy
16	33	256	1089	528
21	38	441	1444	798
26	50	676	2500	1300
23	39	529	1521	897
28	52	784	2704	1456
24	47	576	2209	1128
17	35	289	1225	595
22	43	484	1889	946
21	41	441	1681	861
$\displaystyle {\sum{{}}}$ =198	378	4476	16222	8509

Statistical-Correlation-Regression

Correlation & Regression

Statistical Correlation

Measures of Statistical Correlation

Scatter Diagram Method

Karl Pearson’s method of Co-efficient of Correlation

Co-efficient of Correlation Computation : Direct method

Co-efficient of Correlation Computation : Assumed Mean

Spearman’s Rank Correlation

Rank Correlation – Problems

Rank Correlation – Problems

Concurrent Deviations

Regression Analysis

Simple and Multiple Regression Analysis

Linear and Non- linear Regression Analysis

Total and Partial Regression Analysis

Graphical Method of Regression Analysis

Algebraic Method of Regression Analysis

Deviation from Actual Means

Coefficient of Regression

Like this:

Work from Home. Earn Money

Thank you for your response. ✨

Marks in English X	X²	Marks in Statistics Y	Y²	XY
1	1	6	36	6
2	4	7	49	14
3	9	8	64	24
4	16	9	81	36
5	25	10	100	50
$\displaystyle \sum{{}}$ X =15	$\displaystyle \sum{{}}$ X² = 55	$\displaystyle \sum{{}}$ Y = 40	$\displaystyle \sum{{}}$ Y² = 330	$\displaystyle \sum{{}}$ XY = 130

Ranks given by			Differences (d)			Squares of differences (d²)
1^st Judge	2^nd Judge	3^rd Judge	_(i)	_(ii)	_(iii)	_(i)	_(ii)	_(iii)
1	3	6	-2	-3	-5	4	9	25
6	5	4	1	1	2	1	1	4
5	8	9	-3	-1	-4	9	1	16
10	4	8	6	-4	2	36	16	4
3	7	1	-4	6	2	16	36	4
2	10	2	-8	8	0	64	64	0
4	2	3	2	-1	1	4	1	1
9	1	10	8	-9	-1	64	81	1
7	6	5	1	1	2	1	1	4
8	9	7	-1	2	1	1	4	1
		Total $\displaystyle \left( {\sum{{}}} \right)$				200	214	60

Rank of X	Rank of Y	d = X – Y	d²
3	1	2	4
5	4	1	1
2	5	-3	9
7	3	4	16
8	2	6	36
4	6	-2	4
1	7	-6	36
6	8	2	4
n = 8			$\displaystyle \sum{{}}$ d² = 110

X	Y	(X – 32) x	(Y – 38) y	X²	Y²	xy
25	43	-7	5	49	25	-35
28	46	-4	8	16	64	-32
35	49	3	11	9	121	33
32	41	0	3	0	9	0
31	36	-1	-2	1	4	2
36	32	4	-6	16	36	-24
29	31	-3	-7	9	49	21
38	30	6	-8	36	64	-48
34	33	2	-5	4	25	-10
32	39	0	1	0	1	0
$\displaystyle {\sum{{}}}$ X = 320	$\displaystyle {\sum{{}}}$ Y = 380	$\displaystyle {\sum{{}}}$ x = 0	$\displaystyle {\sum{{}}}$ y = 0	$\displaystyle {\sum{{}}}$ x² = 140	$\displaystyle {\sum{{}}}$ y² = 398	$\displaystyle {\sum{{}}}$ xy = -93

X	Y	x(X-10)	y(Y-8)	x²	y²	xy
11	10	1	2	1	4	2
8	8	-2	0	4	0	0
9	6	-1	-2	1	4	2
5	5	-5	-3	25	9	15
8	9	-2	1	4	1	-2
9	7	-1	-1	1	1	1
20	11	10	3	100	9	30
( $\displaystyle \left( {\sum{=}} \right)$ )70	56	0	0	136	28	48