**Statistical Theoretical Distribution**

**Statistical Theoretical Distribution**

**Statistical Theoretical Distributions** are deduced mathematically based on certain assumption (not obtained by actual observations or experiments).

Types of theoretical distributions commonly used in statistical analysis

- Binomial distribution due to James Bernoulli
- Poission Distribution due to S. D. Poission
- Normal Distribution due to Demoivre

**Importance of Theoretical Distribution:**

**Estimate of nature and trend of frequency distribution**: On the basis of theoretical frequency distribution, the nature and trend of frequency distribution can be estimated under certain assumptions and conditions.**Basis of logical decisions**: Risk and uncertainty of an event can be analysed on the basis of theoretical distribution for taking logical decisions.**Forecasting**: It provides base for prediction, projection and forecasting.**Test of Sampling**: It serves as benchmarks against which actual frequency distribution and deviations are compared.

**Binomial Distribution**

**Binomial Distribution** is useful where there are only two outcomes (e.g success or failure, good or defective, hit or miss, yes or no etc).

**Binomial Probability Function**

Binomial probability distribution gives the probability of obtaining exactly x successes and (n x) failures in n trials.

Probability for the number of success in a given number of trials is given by

((x) = ncxpxqn-x .. (1) for x = 0, 1, 2, ..,n [Random variable (x) is an integer]

Where p = constant probability of success in a single trial. q = 1 p, (as p + q = 1, q is the probability of failure), n = number of trials, x = number of successes in n trials (x (n).

The above terms of ((x) are the successive terms of the binomial expansion of (q+p)n i.e. qn+nc1p1qn-1+nc2p2 qn-2+..+pn. So it is known as Binomial distribution.

**Binomial Distribution Properties**

**Binomial Distribution is a discrete probability distribution**, where the random variable x (i.e. the no. of successes) assumes the values 0, 1, 2, , n, where n is finite and x <=n.

Mean = np, variance = npq, s.d. (() = ( variance = ( npq. Skewness = [(q-p) / (( npq)],

Kurtosis = (1-6pq) / npq, where p+q=1.

Skewness is positive for p(½, negative for p(½ and zero for p =½. Value of x which has maximum probability

**Assumptions of Binomial Distribution**

1.Each trial has two mutually exclusive possible outcomes, i.e. success or failure. 2. Each trial is independent of other trials. 3. The probability of a success remains constant from trial to trial. 4. The probability of getting a head in a toss of coins is ½. This result must remain same in successive tosses. 5. The number of trial is fixed.

**Sequence of p and q**

The general from of binomial distribution is the expansion of (p +q)n, in which the number of successes is written in a descending order. If the number of successes is written in an ascending order, then (q +p)n will be expanded.

**Binomial expansion of events**

**Rules for coefficients of binomial expansion**

1.The first term is pn. 2. The second term is nC1 pn(1 q. 3. In each succeeding term the power of p is reduced by 1 and the power of q is increased by 1. 4. The coefficient of any term is found by multiplying the coefficient of the preceding term by the power of p and dividing the products so obtained by one more than the power of q in that preceding term.

**Binomial Distribution – Problems**

**Binomial Distribution – Problems**

Ex. 4 coins are tossed simultaneously. What is the probability of getting (i) 2 heads (ii) at least 2 heads and (iii) at least one head.

The random experiment consists in tossing 4 coins and observing the number of heads. Let occurrence of heads be treated as success.

p = probability of getting a head = ½ , q = 1(½ = ½ .

Value of p is constant for each coin and the trials are all independent

((x) = ncxpxqn-x = 4cx (½)x (½)4(x . Here n=4

(i) Probability of getting 2 heads : Putting x =2 , we get ((2) = 4c2 (½)2 (½)4-2 = 6 (¼) (¼) = ⅜

(ii) At least 2 heads means 2 or more than 2 heads i.e. 2 or 3 or 4 heads.

So, Probability of at least two heads = ((2)+((3)+((4)

= 4c2 (½)2 (½)4-2 +4c3 (½)3 (½)4-3 +4c4 (½)4 (½)0

= 6 (¼) (¼) + 4 (1/8) (½ ) + 1 (1/16 ). 1 = ⅜ + ¼ + 1/16 = 11/16

**Binomial Distribution – Problems**

**Binomial Distribution – Problems**

Ex. 1 : Six coins are tossed. Find probability of more than 4 heads

Let us assume probability of success= p. Probability of getting a head =p= ½.

So, p =½ (= constant prob.), q = 1 p = 1(½ = ½ , n=6.

The probability function is

((x) = ncxpxqn-x = 6cx (½)x (½)6 -x

More than 4 heads means 5 and 6 heads.

So probability = ((5) + ((6) = 6c5 (½)5 (½ )6-5 + 6c6 (½)6 (½)6-6

= [6x (1/25) x (1/2)] + 1 x (1/26) = 7/26 = 7/64

**Binomial Distribution – Problems**

**Binomial Distribution – Problems**

Ex. 1 : Given the probability of defective screws is 1/6.

Find the following for the binomial distribution of defective screws in a total of 180:

(i) the mean (ii) the s.d. (iii) moment coefficient of skewness.

Here n = 180, p = 1/6, q = 1- 1/6 = 5/6 .

(i) Mean of binomial distribution = np = 180 x 1/6 = 30

(ii) s.d. (() = ( npq = (180. (1/6 ). (5/6 )=( 25 = 5

(iii) Moment coefficient of skewness = {(q-p){ / { ((npq) } = {(5/6 – 1/6)} / 5 = [(4/6) / 5] = 2/15 (or .1333)

Ex. 2 : The incidence of occupational disease in an industry is such that the workmen have a 20% chance of suffering from it. What is the probability that out of six workmen, 4 or more will contract the disease?

Probability that a worker will suffer from disease (p) = 20/100 = 1/5 ; q = 1 (1/5) = 4/5, n = 6

((x) = 6cxpxq6 – x for x = 4, 5, 6 (as x (4)

((4) + ((5) + ((6) = 6c4(1/5)4(4/5)2 + 6c5(1/5)5(4/5) + 6c6(1/5)6(4/5)0

= (1/56) x (6c4.42 + 6c5.4 + 1)= (1/15625) x( 240+24+1) = 265 / 15625 ( or 0.164)

**Binomial Distribution – Problems**

**Binomial Distribution – Problems**

Ex. 1 : The arithmetic mean of binomial distribution is 6 and S.D. is 4. Is this calculation correct.

Here (X = np = 6, s.d = ( (npq) = 4, so npq= 42 = 16. q= (npq)/ (np) = 16/6 = 2.67 (i.e q> 1)

As p+q =1, q cannot exceed value of 1. So the calculation is not correct.

Ex. 2 : The incidence of occupational disease in an industry is such that the workmen have a 10% chance of suffering from it. What is the probability that out of 5 workmen, 3 or more will contract the disease?

n = 5 and p = probability of workman suffering from disease = 10%= 0.1. So q = 1 0.1 = 0.9.

f(x) = 5Cx. (0.1)x. (0.9)5-x, for x = 0, 1, 2, , 5.

The probability that 3 or more workmen will contract the disease P (x > 3)

= f (3) + f(4) + f(5)

= 5C3 (0.1)3 (0.9)5-3 + 5C4 (0.1)4. (0.9)5-4 + 5C5 (0.1)5

= (10 x 0.001 x 0.81) + (5 x 0.0001 x 0.9) + (1 x 0.00001) = 0.0081 + 0.00045 + 0.00001= 0.0086

**Poisson Distribution**

Binomial distribution can not be applied where n cannot be estimated. In such cases, **Poisson Distribution** is applicable.

Poisson distribution is defined by the probability function.

f(x) = (e(mmx) / (x!), for x (no. of successes) = 0, 1, 2, 3, .

(as the total probability must be unity)

If the value of the parameter m is known, the distribution is completely known. The value of m generally lies between 0.1 and 10.

**Properties of Poisson Distribution:**

**Discrete distribution**: Like binomial distribution it is also a discrete probability distribution i.e. occurrences can be described by a random variable.**Main parameter**: The main parameter is mean (m) which is equal to np i.e. m = np.**Form**: It is a positively skewed distribution.

**Assumption of Poisson Distribution:**

- The occurrences of events are independent, i.e. the occurrence of an event in an interval of time or space does not effect the probability of a second occurrence of the event in the same (or any other) interval.
- The probability of a single occurrence of the event in a given interval is proportional to the length of the interval.
- The probability of occurrence of more than one event in a very small interval is negligible.

**Examples of Poisson distribution:**

The number of telephone calls received at a particular switch board per minute during a certain hour of the day. The number of deaths per day in a district or town in one year by a disease.

The number of cars passing a certain point per minute.

The number of persons born deaf and dumb per year in a city. The number of typographical errors per page. The number of printing errors per page. The number of defective blades in a pack.

**Poisson Distribution Computation**

**Poisson Distribution is a discrete distribution**. You may find out the probability of exactly 0, 1, 2, .n successes, in following steps

Step 1 : Find out arithmetic mean of observed data, denoted as m, i.e., X = m

Step 2 : Compute value of e( m (e = 2.7183, the base of natural logarithms)

e-m = 1/(em) = 1/ (2.7183)m = 1/ [antilog (log 2.7183 x m)] = 1/ [antilog (.4343 x m)]

Step 3 : Compute probability of 0, 1, 2, ..n successes, using Poisson Distribution P(x) =

e-m . [(mx) / x!], or P(r) = e-m . [(mr) / r!], where X or r = No. of Successes 0, 1, 2, ..n, e=2.7183, m=X=Arithmetic Mean

**Poisson Distribution – Problems**

**Poisson Distribution – Problems**

Ex. A random variable x follows Poisson distribution having parameter 2. Find the probabilities that x assumes the values (i) 0, 1, 3, (ii) less then 3 (iii) at least 2

(given e-2 = .1353).

Here, m=2.

(i) f(x) = e-m . [(mx) / x!] = e-2 . [(2x) / x!], for x = 0, 1, 3

f(0) = e-2 . [(20) / 0!] = [(e-2 x 1) /1 ] = e-2 = 0.13534

f(1) = e-2 . [(21) / 1!] = [(e-2 x2) /1 ] = e-2 x2 = 0.27068

f(3) = e-2 . [(23) / 3!] = [(e-2x 8) /6 ] = e-2 x (4/3) = 0.1804

(ii) Less than 3 indicates either 0, or 1 or 2 i.e. x = 0 or 1 or 2.

f(x) = ((0)+ ((1) + ((2) = .1353 + .2706 + [(e(2.22) / 2!] = (.1353) + (.2706) +(.1353 x 2)

= .1353 + .2706 + .2706 = .6765.

(iii) At least 2 means either 2 or 3 or 4

f(x) = ((2) + ((3) + ((4) + .. = 1 {((0) + ((1)} = 1 (.1353 + .2706) = 1.4059 = .5941.

**Poisson Distribution – Problems**

**Poisson Distribution – Problems**

Ex. 1 : If a random variable x follows a poisson distribution such that P(x = 1) = P (x = 2); find P(x = 0)

f(x) = e-m . [(mx) / x!]. So, P(x=1) = f(1) = e-m . [(m1) / 1!] = me( m

P(x=2) = f(2) = e-m . [(m2) / 2!] = [(m2e( m) / 2]

Now, as given in the problem , f(1) = f(2), so, me( m = [(m2e( m) / 2]. Or, 1=m/2, or m=2

So, f(0) = e-2 . [(20) / 0!] = e-2

Ex. 2 : One tenth per cent of the blades produced by the blade manufacturing factory turn out to be defective. The blades are supplied in packets of 20. Use Poisson distribution to calculate the approximate number of packets containing (i) no defective (ii) one defective and (iii) two defective blades respectively in a consignment of 4,00,000 packets.

Let the occurrence of a defective blade be a success. Here, p=(1/10) % = (1/1000), n=20, m=np=20 x (1/1000) = .02

f(x) = e-m . [(mx) / x!] = e-0.02 . [(.02)x / x!], for 0, 1, 2, defective blades

f(x) = 4,00,000 x [{e-0.02 . (.02)x } / x!]

f(0) = 4,00,000 x [{e-0.02 . (.02)0 } / 0!] = 4,00,000 x e( .02 = 4,00,000 x .9802 = 3,92,080

f(1) = 4,00,000 x [{e-0.02 . (.02)1 } / 1!] = 4,00,000 x e( .02 x (.02)1

= 4,00,000 x .9802x .02 = 7842 (appx)

f(2) = 4,00,000 x [{e-0.02 . (.02)2 } / 2!] = (4,00,000 x e( .02 x (.02)2 ) / 2!

= 4,00,000 x .9802x .0002 = 78 (appx)

**Poisson Distribution – Problems**

**Poisson Distribution – Problems**

Ex. Printing mistakes per page committed by a press follows a Poisson distribution. Find the expected frequencies for the following distribution of printing mistakes:

(Value of e(1.5 = (0.22313)

Here, Mean = [{(40 x 0) + (30 x 1) + (20 x 2) + (3 x 15) + (4 x 10) + (5 x 5)}/ 120] = 1.5

P(0) = e-1.5 = 0.22313, P(1) = e( 1.5 x 1.5 = 0.22313 x 1.5 = .34695

P(2) = e-1.5 . [(1.5)2 / 2!] = .25, P(3) = e-1.5 . [(1.5)3 / 3!] =0.13,

P(4) = e-1.5 . [(1.5)4 / 4!] =0.05, P(5) = e-1.5 . [(1.5)5 / 5!] =0.01

Expected Frequency = (Ne(( (x ) / X

Putting the values, we get the values as follows

**Poisson Distribution – Problems**

**Poisson Distribution – Problems**

The number of accidents in a year attributed to bus drivers in a city follows. Poisson distribution with mean 3. Out of 3,000 bus drivers, find the number of drivers with (i) no accident in a year and (ii) at least 3 accidents in a year.

[e( 3 = 0.0498]

Here m=3, P(x =r) = e-m . [(mr) / r!]

(i) Probability of No accidents =P(0) = e-3 = .0498.

So, number of drivers with no accidents = 3000 x .0498 = 149

(ii) Probability of at least 3 accidents = P(x ( 3) = 1 ( P(x ( 3) = 1- [[P(0) + P(1) + P( 2) + P(3)]

= 1 – [(e( m) + (e( m.m) + {(e( m.m2) / 2!} + {(e( m.m3) / 3!}] = 1- e( m [1 + m+ (m2 / 2!) + (m3 / 3!)]

= 1 – e( 3 [1 + 3+ (32 / 2!) + (33 / 3!)] = 1 – e( 3 [1 + 3+ (9/2) + (9/2)] = 1 -( e( 3 x 13) = 1 – (.0498 x13)

= 1( 0.6474 = 0.3526

So, Number of Bus drivers with at least 3 accidents in a year = 3000 x 0.3526 = 1058

**Normal Distribution**

**Normal Distribution** is continuous probability distribution in which the relative frequencies of a continuous variable are distributed according to normal probability law. It is a symmetrical distribution in which the frequencies are distributed evenly about the mean of distribution.

Normal distribution is defined by the probability density function:

for ( ( < x < + (, where (X = Mean, ( = Standard deviation, e (base of natural logarithm) = 2.7183, ( = 3.1415.

Normal distribution in its standard normal variate (S.N.V.) form is given by:

( < Z < + (, z=S.N.V = (X-(X) / (.

The mean of Z is zero and standard deviation of Z is 1

In a normal distribution, the quartiles

Q1 and Q3 are equi-distant from the median. Due to this property Q3 ( M = M ( Q1.

**Normal Distribution Properties**

**Bell Shaped**: The normal curve is perfectly symmetrical and bell shaped about mean. This implies that if we fold the curve along its vertical axis at the center, the two halves would coincide.**Continuous Distributio**n: Normal distribution is a distribution of continuous variables. For this reason, it is called continuous probability distribution.**Parameters of Distribution**: Two main parameters of normal distribution are: Mean ((X) and Standard Deviation (S.D.). The entire distribution can be known from these two parameters.**Relationship between M.D. and S.D.**: In a normal distribution, the mean deviation (M.D.) is 4/5 times the standard deviation, i.e., M.D = (4/5) x S.D

**Normal Curve in Statistics**

**Normal Curve** is the most prominent probability distribution model used in statistics. Normal curve is bell-shaped perfectly symmetric curve, centered on the mean, equal to its median and mode.

The equation of the normal curve depends on Mean (X or ( ) and Standard Deviation (().

For different values of (X) and (, different normal curves are obtained. Since ( and ( can assume an infinite number of value, it is impracticable to tabulate the area under the curve for different values of ( and (.

For the sake of convenience, standard normal curve or unit normal curve is constructed with ( = 0 and standard deviation = 1. Subsequently, the given value of the normal variate is transformed into standard units by the formula of Z- transformation

Z= (X ( (X) / (, where, Z= z-transformation, (X (or ( ) = Arithmetic mean of population, X= Value of Observation, ( = S.D. of distribution

Ex. find the area under Standard Normal Curve between z = 0 & z = 1.8 and between z = 0 & z = 1.85.

In the Standard Normal Table, the value corresponding to 1.8 & 0 is 0.4641. Similarly, the value between z = 0, and z = 1.85.is 0.4678

Hence, required areas are respectively 0.4641 and 0.4678.

In the first case, 0.4641 represents the probability that z lies between 0 and 1.8 i.e., p(0<= z <= 1.8) = 0.4641 and in second case, p(0 <=z <= 1.85) = 0.4678.

**Statistical Normal Curve – Problems**

**Statistical Normal Curve – Problems**

Ex. 1 : Find the area under Standard Normal Curve. between z = ( 1.67 and z = 0.

Here we are to find area between z = ( 1.67 & z = 0 and area between z = 0 & z = 1.67

From the table of area under Standard Normal Curve, the corresponding number is 0.4525 which is the reqd. area. Also P (( 1.67 ( z ( 0) = 0.4525.

Ex. 2 : Find the area under Standard Normal Curve, between z = 0.82 and z = 1.96. This area cannot be calculated directly, so we have to break up as follows :

Reqd. area = (area between z = 0 and z = 1.96) ( (area between z = 0 and z = 0.82) = (0.4750 ( 0.2939) = 0.1811. p (0.82 ( z ( 1.96) = 0.1811.

Ex.3 : Find the area under Standard Normal Curve between z = ( 0.75 and z = 0) + (area between z = 0 and z = 2.04)

Required (area between z = 0 and z = .75) + (area between z = 0 and z = 2.04) = .2734 + .4793 = 0.7527. p (( 0.75 ( z ( 2.04) = 0.7527.

**Statistical Normal Curve – Problems**

**Statistical Normal Curve – Problems**

Ex. A normal curve has x! = 20 and ( = 4, find the probability that x assumes a value between 16.8 and 27.6.

Z= (x- (x) / (, z = standard normal variate corresponding to x

x1 = 16.8, z1 = [(16.8-20) / 4] = -3.2 / 4 = -.8, (x = 20, ( = 4

x2= 27.6, z2= (27.6- 20)/4 = 7.6 /4 = 1.9

Now P (16.8 ( x ( 27.6) = P (( .8 ( z ( 1.9) = (area between z = (.8 and z = 0) + (area between z = 0 and z = 1.9) = (area between z = 0 and z = .8) + (area between z = 0 and z = 1.9) = .2881+.4713 = 0.7594.

**Statistical Normal Curve – Problems**

**Statistical Normal Curve – Problems**

Ex. How many male workers in a factory have a salary between (i) Rs.800 and 1360, and (ii) more than Rs.1440 if the mean salary is Rs.1000 and s.d. is Rs.200 and number of workers is 20,000, if the salary of the workers is assumed to be normally distributed.

At first we are to find standard normal variate corresponding to given variates

(i) x1=800, z1=(800-1000) / 200 = -200/200 = -1 as (x = 1000, ( = 200

x2= 1360, z2= (1360-1000) / 200 = 360/200 = 1.8

Now p (800( x ( 1360) = P ((1( z ( 1.8) = (area between z = ( 1 and z = 0) + (area between z = 0 and z = 1.8) = (area between z = 0 and z = 1) + (area between z = 0 and z = 1.8) = .3413 + .4641 = .8054

i.e., 80.54% of the total workers have a salary between Rs.800 and Rs.1360

Number of workers getting salary between Rs.800 and Rs.1360 = .8054 x 20,000 = 16108

(ii) For x=1440, z= (1440-1000)/200 = 440/200 = 2.2

Now p (x ( 1440) = P (z ( 2.2) = (area under standard normal curve. to the right of z = 2.2)

= area to the right of z = 0 (area between z = 0 and z = 2.2) = .5000 .4861= 0.0139

So, 1.39% of the total workers have a salary more than Rs.1440, and Number of such workers = .0139 x 20,000 = 278

**Statistical Normal Curve – Problems**

**Statistical Normal Curve – Problems**

Ex. The income distribution of Engineers of a company was found to follow normal distribution. The average income of an Engineer was Rs.14,000. The standard deviation of the income of Engineers was Rs. 2,500. If there were 484 Engineers drawing salary above Rs. 15750, how many Engineers were there in the company?

[The area under standard normal curve between 0 and 0.7 is 0.2580]

z=[(x-(x) / (] = (15750 – 14000 )/ 2500 = 0.07

Using Normal Distribution, P(Z..07) = .5 -.2580 = .242

So, the probability that an officer draws salary more than and equal to Rs. 15750 is 0.242

The number of officers in the company = 484/ .242 = 2000

**Statistical Normal Curve – Problems**

**Statistical Normal Curve – Problems**

Ex. In a sample of 2,000 items, the mean weight and standard deviation are 40 and 20 kilograms respectively. Assuming the distribution to be normal, find the number of items weighing between 20 and 80 kilograms.

z=[(x-(x) / (]. Here (X= 40, ( = 20. So, for x=20, z=(20-40) / 20 = -1

The area under standard normal curve between the mean and z = ( 1 is 0.3413

So, for x=80, z=(80-40) / 20 = 2

The area under standard normal curve between the mean and Z = 2 is 0.4772

So, The probability of items weighing between 20 and 80

P(20 ( x ( 80) = P (( 1 ( z ( 2) = P(0 ( z ( 1) + P(0 ( z ( 2) = 0.3413 + 0.4772 = 0.8185

Number of items weighing between 20 and 80 kilograms is 2000 x 0.8185 = 1637

**Chi Square Distribution**

**Chi-square distribution** (Χ² distribution) is a continuous probability distribution used in Statistical hypothesis tests.

If X1, …, Xk are independent, standard normal random variables, then the sum of their squares

is distributed according to the chi-square distribution with k degrees of freedom. This is usually denoted as Q(x2(k) or Q(x2k .

The chi-square distribution has one parameter: k a positive integer that specifies the number of degrees of freedom (i.e. the number of Xis)

Additivity : From the definition of the chi-square distribution, it follows that the sum of independent chi-square variables is also chi-square distributed. Specifically, if {Xi}ni=1 are independent chi-square variables with {ki}ni=1 degrees of freedom, respectively, then Y=X1+X2+Xn is chi-square distributed with k1 + k2 + kn degrees of freedom

**T Distribution in Statistics**

The **T-Distribution** is a theoretical probability distribution. T-distribution depicts the set of observations mostly falling close to the mean, the rest of the observations making up the tails on either side.

T distribution is symmetrical, bell-shaped, and similar to the standard normal curve. It differs from the standard normal curve in the way that it has an additional parameter, called Degrees of Freedom, which changes its shape

Degrees of Freedom : Degrees of freedom, usually symbolized by df, (which can be any real number greater than zero (0.0)), is a parameter of t distribution. Setting the value of df defines a particular member of the family of t distributions. A member of the family of t distributions with a smaller df has more area in the tails of the distribution than one with a larger df.

**Effect of df on the four t distribution**

Smaller the df, the flatter is the shape of the distribution, resulting in greater area in the tails of the distribution

Relationship to the Normal Curve

The T distribution looks similar to the normal curve. As the df increase, the t distribution approaches the standard normal distribution ((=0.0, (=1.0).

The standard normal curve is a special case of the t distribution when df=(. The t distribution approaches the standard normal distribution relatively quickly

**F-Distribution in Statistics**

Enumerated by Ronald Fisher, **F Distribution** is the measure of the spread or scattering of members of two observed random samples as a test of whether the samples have the same variability.

F distribution is obtained by taking the ratio of the chi-square distributions of the samples divided by the number of their degrees of freedom

statistic: F=(u/u1)/(v/v1) has an F distribution with (u1,v1) degrees of freedom , where u and v are independently distributed chi-squared variables with u1 and v1 degrees of freedom, respectively,

From the definition of the t distribution, the square of a t statistic may be written as:

t2=(z2/1)/(v/v1), where z2, being the square of a standard normal variable, has a chi-squared distribution

Thus the square of a t variable with v1 degrees of freedom is an F variable with (1,v1) degrees of freedom, that is: t2=F(1,v1)

Click here to see **PDF**