Statistical Theoretical Distribution

Statistical Theoretical Distribution

Statistical Theoretical Distributions are deduced mathematically based on certain assumption (not obtained by actual observations or experiments).

Types of theoretical distributions commonly used in statistical analysis

  • Binomial distribution due to James Bernoulli
  • Poission Distribution due to S. D. Poission
  • Normal Distribution due to Demoivre

Importance of Theoretical Distribution:

  • Estimate of nature and trend of frequency distribution: On the basis of theoretical frequency distribution, the nature and trend of frequency distribution can be estimated under certain assumptions and conditions.
  • Basis of logical decisions: Risk and uncertainty of an event can be analysed on the basis of theoretical distribution for taking logical decisions.
  • Forecasting: It provides base for prediction, projection and forecasting.
  • Test of Sampling: It serves as benchmarks against which actual frequency distribution and deviations are compared.

Binomial Distribution

Binomial Distribution is useful where there are only two outcomes (e.g success or failure, good or defective, hit or miss, yes or no etc).

Binomial Probability Function
Binomial probability distribution gives the probability of obtaining exactly x successes and (n x) failures in n trials.

Probability for the number of success in a given number of trials is given by
((x) = ncxpxqn-x .. (1) for x = 0, 1, 2, ..,n [Random variable (x) is an integer]
Where p = constant probability of success in a single trial. q = 1 p, (as p + q = 1, q is the probability of failure), n = number of trials, x = number of successes in n trials (x (n).

The above terms of ((x) are the successive terms of the binomial expansion of (q+p)n i.e. qn+nc1p1qn-1+nc2p2 qn-2+..+pn. So it is known as Binomial distribution.

Binomial Distribution Properties

Binomial Distribution is a discrete probability distribution, where the random variable x (i.e. the no. of successes) assumes the values 0, 1, 2, , n, where n is finite and x <=n.
Mean = np, variance = npq, s.d. (() = ( variance = ( npq. Skewness = [(q-p) / (( npq)],
Kurtosis = (1-6pq) / npq, where p+q=1.
Skewness is positive for p(½, negative for p(½ and zero for p =½. Value of x which has maximum probability

Assumptions of Binomial Distribution

1.Each trial has two mutually exclusive possible outcomes, i.e. success or failure. 2. Each trial is independent of other trials. 3. The probability of a success remains constant from trial to trial. 4. The probability of getting a head in a toss of coins is ½. This result must remain same in successive tosses. 5. The number of trial is fixed.

Sequence of p and q
The general from of binomial distribution is the expansion of (p +q)n, in which the number of successes is written in a descending order. If the number of successes is written in an ascending order, then (q +p)n will be expanded.

Binomial expansion of events

Rules for coefficients of binomial expansion
1.The first term is pn. 2. The second term is nC1 pn(1 q. 3. In each succeeding term the power of p is reduced by 1 and the power of q is increased by 1. 4. The coefficient of any term is found by multiplying the coefficient of the preceding term by the power of p and dividing the products so obtained by one more than the power of q in that preceding term.

Binomial Distribution – Problems

Binomial Distribution – Problems
Ex. 4 coins are tossed simultaneously. What is the probability of getting (i) 2 heads (ii) at least 2 heads and (iii) at least one head.

The random experiment consists in tossing 4 coins and observing the number of heads. Let occurrence of heads be treated as success.
p = probability of getting a head = ½ , q = 1(½ = ½ .
Value of p is constant for each coin and the trials are all independent
((x) = ncxpxqn-x = 4cx (½)x (½)4(x . Here n=4
(i) Probability of getting 2 heads : Putting x =2 , we get ((2) = 4c2 (½)2 (½)4-2 = 6 (¼) (¼) = ⅜
(ii) At least 2 heads means 2 or more than 2 heads i.e. 2 or 3 or 4 heads.
So, Probability of at least two heads = ((2)+((3)+((4)
= 4c2 (½)2 (½)4-2 +4c3 (½)3 (½)4-3 +4c4 (½)4 (½)0
= 6 (¼) (¼) + 4 (1/8) (½ ) + 1 (1/16 ). 1 = ⅜ + ¼ + 1/16 = 11/16

Binomial Distribution – Problems

Binomial Distribution – Problems

Ex. 1 : Six coins are tossed. Find probability of more than 4 heads
Let us assume probability of success= p. Probability of getting a head =p= ½.
So, p =½ (= constant prob.), q = 1 p = 1(½ = ½ , n=6.

The probability function is
((x) = ncxpxqn-x = 6cx (½)x (½)6 -x
More than 4 heads means 5 and 6 heads.
So probability = ((5) + ((6) = 6c5 (½)5 (½ )6-5 + 6c6 (½)6 (½)6-6
= [6x (1/25) x (1/2)] + 1 x (1/26) = 7/26 = 7/64

Binomial Distribution – Problems

Binomial Distribution – Problems

Ex. 1 : Given the probability of defective screws is 1/6.
Find the following for the binomial distribution of defective screws in a total of 180:
(i) the mean (ii) the s.d. (iii) moment coefficient of skewness.

Here n = 180, p = 1/6, q = 1- 1/6 = 5/6 .
(i) Mean of binomial distribution = np = 180 x 1/6 = 30
(ii) s.d. (() = ( npq = (180. (1/6 ). (5/6 )=( 25 = 5
(iii) Moment coefficient of skewness = {(q-p){ / { ((npq) } = {(5/6 – 1/6)} / 5 = [(4/6) / 5] = 2/15 (or .1333)

Ex. 2 : The incidence of occupational disease in an industry is such that the workmen have a 20% chance of suffering from it. What is the probability that out of six workmen, 4 or more will contract the disease?
Probability that a worker will suffer from disease (p) = 20/100 = 1/5 ; q = 1 (1/5) = 4/5, n = 6
((x) = 6cxpxq6 – x for x = 4, 5, 6 (as x (4)
((4) + ((5) + ((6) = 6c4(1/5)4(4/5)2 + 6c5(1/5)5(4/5) + 6c6(1/5)6(4/5)0
= (1/56) x (6c4.42 + 6c5.4 + 1)= (1/15625) x( 240+24+1) = 265 / 15625 ( or 0.164)

Binomial Distribution – Problems

Binomial Distribution – Problems

Ex. 1 : The arithmetic mean of binomial distribution is 6 and S.D. is 4. Is this calculation correct.
Here (X = np = 6, s.d = ( (npq) = 4, so npq= 42 = 16. q= (npq)/ (np) = 16/6 = 2.67 (i.e q> 1)
As p+q =1, q cannot exceed value of 1. So the calculation is not correct.

Ex. 2 : The incidence of occupational disease in an industry is such that the workmen have a 10% chance of suffering from it. What is the probability that out of 5 workmen, 3 or more will contract the disease?
n = 5 and p = probability of workman suffering from disease = 10%= 0.1. So q = 1 0.1 = 0.9.
f(x) = 5Cx. (0.1)x. (0.9)5-x, for x = 0, 1, 2, , 5.
The probability that 3 or more workmen will contract the disease P (x > 3)
= f (3) + f(4) + f(5)
= 5C3 (0.1)3 (0.9)5-3 + 5C4 (0.1)4. (0.9)5-4 + 5C5 (0.1)5
= (10 x 0.001 x 0.81) + (5 x 0.0001 x 0.9) + (1 x 0.00001) = 0.0081 + 0.00045 + 0.00001= 0.0086

Poisson Distribution

Binomial distribution can not be applied where n cannot be estimated. In such cases, Poisson Distribution is applicable.
Poisson distribution is defined by the probability function.
f(x) = (e(mmx) / (x!), for x (no. of successes) = 0, 1, 2, 3, .

(as the total probability must be unity)

If the value of the parameter m is known, the distribution is completely known. The value of m generally lies between 0.1 and 10.

Properties of Poisson Distribution:

  • Discrete distribution: Like binomial distribution it is also a discrete probability distribution i.e. occurrences can be described by a random variable.
  • Main parameter: The main parameter is mean (m) which is equal to np i.e. m = np.
  • Form: It is a positively skewed distribution.

Assumption of Poisson Distribution:

  • The occurrences of events are independent, i.e. the occurrence of an event in an interval of time or space does not effect the probability of a second occurrence of the event in the same (or any other) interval.
  • The probability of a single occurrence of the event in a given interval is proportional to the length of the interval.
  • The probability of occurrence of more than one event in a very small interval is negligible.

Examples of Poisson distribution:
The number of telephone calls received at a particular switch board per minute during a certain hour of the day. The number of deaths per day in a district or town in one year by a disease.
The number of cars passing a certain point per minute.

The number of persons born deaf and dumb per year in a city. The number of typographical errors per page. The number of printing errors per page. The number of defective blades in a pack.

Poisson Distribution Computation

Poisson Distribution is a discrete distribution. You may find out the probability of exactly 0, 1, 2, .n successes, in following steps
Step 1 : Find out arithmetic mean of observed data, denoted as m, i.e., X = m
Step 2 : Compute value of e( m (e = 2.7183, the base of natural logarithms)
e-m = 1/(em) = 1/ (2.7183)m = 1/ [antilog (log 2.7183 x m)] = 1/ [antilog (.4343 x m)]
Step 3 : Compute probability of 0, 1, 2, ..n successes, using Poisson Distribution P(x) =
e-m . [(mx) / x!], or P(r) = e-m . [(mr) / r!], where X or r = No. of Successes 0, 1, 2, ..n, e=2.7183, m=X=Arithmetic Mean

Poisson Distribution – Problems

Poisson Distribution – Problems

Ex. A random variable x follows Poisson distribution having parameter 2. Find the probabilities that x assumes the values (i) 0, 1, 3, (ii) less then 3 (iii) at least 2
(given e-2 = .1353).

Here, m=2.
(i) f(x) = e-m . [(mx) / x!] = e-2 . [(2x) / x!], for x = 0, 1, 3
f(0) = e-2 . [(20) / 0!] = [(e-2 x 1) /1 ] = e-2 = 0.13534
f(1) = e-2 . [(21) / 1!] = [(e-2 x2) /1 ] = e-2 x2 = 0.27068
f(3) = e-2 . [(23) / 3!] = [(e-2x 8) /6 ] = e-2 x (4/3) = 0.1804

(ii) Less than 3 indicates either 0, or 1 or 2 i.e. x = 0 or 1 or 2.
f(x) = ((0)+ ((1) + ((2) = .1353 + .2706 + [(e(2.22) / 2!] = (.1353) + (.2706) +(.1353 x 2)
= .1353 + .2706 + .2706 = .6765.
(iii) At least 2 means either 2 or 3 or 4
f(x) = ((2) + ((3) + ((4) + .. = 1 {((0) + ((1)} = 1 (.1353 + .2706) = 1.4059 = .5941.

Poisson Distribution – Problems

Poisson Distribution – Problems

Ex. 1 : If a random variable x follows a poisson distribution such that P(x = 1) = P (x = 2); find P(x = 0)

f(x) = e-m . [(mx) / x!]. So, P(x=1) = f(1) = e-m . [(m1) / 1!] = me( m
P(x=2) = f(2) = e-m . [(m2) / 2!] = [(m2e( m) / 2]
Now, as given in the problem , f(1) = f(2), so, me( m = [(m2e( m) / 2]. Or, 1=m/2, or m=2
So, f(0) = e-2 . [(20) / 0!] = e-2

Ex. 2 : One tenth per cent of the blades produced by the blade manufacturing factory turn out to be defective. The blades are supplied in packets of 20. Use Poisson distribution to calculate the approximate number of packets containing (i) no defective (ii) one defective and (iii) two defective blades respectively in a consignment of 4,00,000 packets.

Let the occurrence of a defective blade be a success. Here, p=(1/10) % = (1/1000), n=20, m=np=20 x (1/1000) = .02
f(x) = e-m . [(mx) / x!] = e-0.02 . [(.02)x / x!], for 0, 1, 2, defective blades
f(x) = 4,00,000 x [{e-0.02 . (.02)x } / x!]
f(0) = 4,00,000 x [{e-0.02 . (.02)0 } / 0!] = 4,00,000 x e( .02 = 4,00,000 x .9802 = 3,92,080
f(1) = 4,00,000 x [{e-0.02 . (.02)1 } / 1!] = 4,00,000 x e( .02 x (.02)1
= 4,00,000 x .9802x .02 = 7842 (appx)
f(2) = 4,00,000 x [{e-0.02 . (.02)2 } / 2!] = (4,00,000 x e( .02 x (.02)2 ) / 2!
= 4,00,000 x .9802x .0002 = 78 (appx)

Poisson Distribution – Problems

Poisson Distribution – Problems

Ex. Printing mistakes per page committed by a press follows a Poisson distribution. Find the expected frequencies for the following distribution of printing mistakes:

(Value of e(1.5 = (0.22313)

Here, Mean = [{(40 x 0) + (30 x 1) + (20 x 2) + (3 x 15) + (4 x 10) + (5 x 5)}/ 120] = 1.5
P(0) = e-1.5 = 0.22313, P(1) = e( 1.5 x 1.5 = 0.22313 x 1.5 = .34695
P(2) = e-1.5 . [(1.5)2 / 2!] = .25, P(3) = e-1.5 . [(1.5)3 / 3!] =0.13,
P(4) = e-1.5 . [(1.5)4 / 4!] =0.05, P(5) = e-1.5 . [(1.5)5 / 5!] =0.01

Expected Frequency = (Ne(( (x ) / X
Putting the values, we get the values as follows

Poisson Distribution – Problems

Poisson Distribution – Problems

The number of accidents in a year attributed to bus drivers in a city follows. Poisson distribution with mean 3. Out of 3,000 bus drivers, find the number of drivers with (i) no accident in a year and (ii) at least 3 accidents in a year.
[e( 3 = 0.0498]
Here m=3, P(x =r) = e-m . [(mr) / r!]
(i) Probability of No accidents =P(0) = e-3 = .0498.
So, number of drivers with no accidents = 3000 x .0498 = 149
(ii) Probability of at least 3 accidents = P(x ( 3) = 1 ( P(x ( 3) = 1- [[P(0) + P(1) + P( 2) + P(3)]
= 1 – [(e( m) + (e( m.m) + {(e( m.m2) / 2!} + {(e( m.m3) / 3!}] = 1- e( m [1 + m+ (m2 / 2!) + (m3 / 3!)]
= 1 – e( 3 [1 + 3+ (32 / 2!) + (33 / 3!)] = 1 – e( 3 [1 + 3+ (9/2) + (9/2)] = 1 -( e( 3 x 13) = 1 – (.0498 x13)
= 1( 0.6474 = 0.3526
So, Number of Bus drivers with at least 3 accidents in a year = 3000 x 0.3526 = 1058

Normal Distribution

Normal Distribution is continuous probability distribution in which the relative frequencies of a continuous variable are distributed according to normal probability law. It is a symmetrical distribution in which the frequencies are distributed evenly about the mean of distribution.

Normal distribution is defined by the probability density function:

for ( ( < x < + (, where (X = Mean, ( = Standard deviation, e (base of natural logarithm) = 2.7183, ( = 3.1415.

Normal distribution in its standard normal variate (S.N.V.) form is given by:

( < Z < + (, z=S.N.V = (X-(X) / (.

The mean of Z is zero and standard deviation of Z is 1
In a normal distribution, the quartiles
Q1 and Q3 are equi-distant from the median. Due to this property Q3 ( M = M ( Q1.

Normal Distribution Properties

  • Bell Shaped : The normal curve is perfectly symmetrical and bell shaped about mean. This implies that if we fold the curve along its vertical axis at the center, the two halves would coincide.
  • Continuous Distribution: Normal distribution is a distribution of continuous variables. For this reason, it is called continuous probability distribution.
  • Parameters of Distribution: Two main parameters of normal distribution are: Mean ((X) and Standard Deviation (S.D.). The entire distribution can be known from these two parameters.
  • Relationship between M.D. and S.D. : In a normal distribution, the mean deviation (M.D.) is 4/5 times the standard deviation, i.e., M.D = (4/5) x S.D

Normal Curve in Statistics

Normal Curve is the most prominent probability distribution model used in statistics. Normal curve is bell-shaped perfectly symmetric curve, centered on the mean, equal to its median and mode.

The equation of the normal curve depends on Mean (X or ( ) and Standard Deviation (().
For different values of (X) and (, different normal curves are obtained. Since ( and ( can assume an infinite number of value, it is impracticable to tabulate the area under the curve for different values of ( and (.
For the sake of convenience, standard normal curve or unit normal curve is constructed with ( = 0 and standard deviation = 1. Subsequently, the given value of the normal variate is transformed into standard units by the formula of Z- transformation

Z= (X ( (X) / (, where, Z= z-transformation, (X (or ( ) = Arithmetic mean of population, X= Value of Observation, ( = S.D. of distribution

Ex. find the area under Standard Normal Curve between z = 0 & z = 1.8 and between z = 0 & z = 1.85.
In the Standard Normal Table, the value corresponding to 1.8 & 0 is 0.4641. Similarly, the value between z = 0, and z = 1.85.is 0.4678
Hence, required areas are respectively 0.4641 and 0.4678.

In the first case, 0.4641 represents the probability that z lies between 0 and 1.8 i.e., p(0<= z <= 1.8) = 0.4641 and in second case, p(0 <=z <= 1.85) = 0.4678.

Statistical Normal Curve – Problems

Statistical Normal Curve – Problems

Ex. 1 : Find the area under Standard Normal Curve. between z = ( 1.67 and z = 0.
Here we are to find area between z = ( 1.67 & z = 0 and area between z = 0 & z = 1.67
From the table of area under Standard Normal Curve, the corresponding number is 0.4525 which is the reqd. area. Also P (( 1.67 ( z ( 0) = 0.4525.

Ex. 2 : Find the area under Standard Normal Curve, between z = 0.82 and z = 1.96. This area cannot be calculated directly, so we have to break up as follows :
Reqd. area = (area between z = 0 and z = 1.96) ( (area between z = 0 and z = 0.82) = (0.4750 ( 0.2939) = 0.1811. p (0.82 ( z ( 1.96) = 0.1811.

Ex.3 : Find the area under Standard Normal Curve between z = ( 0.75 and z = 0) + (area between z = 0 and z = 2.04)
Required (area between z = 0 and z = .75) + (area between z = 0 and z = 2.04) = .2734 + .4793 = 0.7527. p (( 0.75 ( z ( 2.04) = 0.7527.

Statistical Normal Curve – Problems

Statistical Normal Curve – Problems

Ex. A normal curve has x! = 20 and ( = 4, find the probability that x assumes a value between 16.8 and 27.6.
Z= (x- (x) / (, z = standard normal variate corresponding to x
x1 = 16.8, z1 = [(16.8-20) / 4] = -3.2 / 4 = -.8, (x = 20, ( = 4
x2= 27.6, z2= (27.6- 20)/4 = 7.6 /4 = 1.9

Now P (16.8 ( x ( 27.6) = P (( .8 ( z ( 1.9) = (area between z = (.8 and z = 0) + (area between z = 0 and z = 1.9) = (area between z = 0 and z = .8) + (area between z = 0 and z = 1.9) = .2881+.4713 = 0.7594.

Statistical Normal Curve – Problems

Statistical Normal Curve – Problems

Ex. How many male workers in a factory have a salary between (i) Rs.800 and 1360, and (ii) more than Rs.1440 if the mean salary is Rs.1000 and s.d. is Rs.200 and number of workers is 20,000, if the salary of the workers is assumed to be normally distributed.

At first we are to find standard normal variate corresponding to given variates

(i) x1=800, z1=(800-1000) / 200 = -200/200 = -1 as (x = 1000, ( = 200
x2= 1360, z2= (1360-1000) / 200 = 360/200 = 1.8
Now p (800( x ( 1360) = P ((1( z ( 1.8) = (area between z = ( 1 and z = 0) + (area between z = 0 and z = 1.8) = (area between z = 0 and z = 1) + (area between z = 0 and z = 1.8) = .3413 + .4641 = .8054
i.e., 80.54% of the total workers have a salary between Rs.800 and Rs.1360
Number of workers getting salary between Rs.800 and Rs.1360 = .8054 x 20,000 = 16108
(ii) For x=1440, z= (1440-1000)/200 = 440/200 = 2.2
Now p (x ( 1440) = P (z ( 2.2) = (area under standard normal curve. to the right of z = 2.2)
= area to the right of z = 0 (area between z = 0 and z = 2.2) = .5000 .4861= 0.0139
So, 1.39% of the total workers have a salary more than Rs.1440, and Number of such workers = .0139 x 20,000 = 278

Statistical Normal Curve – Problems

Statistical Normal Curve – Problems

Ex. The income distribution of Engineers of a company was found to follow normal distribution. The average income of an Engineer was Rs.14,000. The standard deviation of the income of Engineers was Rs. 2,500. If there were 484 Engineers drawing salary above Rs. 15750, how many Engineers were there in the company?

[The area under standard normal curve between 0 and 0.7 is 0.2580]

z=[(x-(x) / (] = (15750 – 14000 )/ 2500 = 0.07
Using Normal Distribution, P(Z..07) = .5 -.2580 = .242
So, the probability that an officer draws salary more than and equal to Rs. 15750 is 0.242
The number of officers in the company = 484/ .242 = 2000

Statistical Normal Curve – Problems

Statistical Normal Curve – Problems

Ex. In a sample of 2,000 items, the mean weight and standard deviation are 40 and 20 kilograms respectively. Assuming the distribution to be normal, find the number of items weighing between 20 and 80 kilograms.
z=[(x-(x) / (]. Here (X= 40, ( = 20. So, for x=20, z=(20-40) / 20 = -1

The area under standard normal curve between the mean and z = ( 1 is 0.3413
So, for x=80, z=(80-40) / 20 = 2
The area under standard normal curve between the mean and Z = 2 is 0.4772
So, The probability of items weighing between 20 and 80
P(20 ( x ( 80) = P (( 1 ( z ( 2) = P(0 ( z ( 1) + P(0 ( z ( 2) = 0.3413 + 0.4772 = 0.8185
Number of items weighing between 20 and 80 kilograms is 2000 x 0.8185 = 1637

Chi Square Distribution

Chi-square distribution (Χ² distribution) is a continuous probability distribution used in Statistical hypothesis tests.
If X1, …, Xk are independent, standard normal random variables, then the sum of their squares

is distributed according to the chi-square distribution with k degrees of freedom. This is usually denoted as Q(x2(k) or Q(x2k .
The chi-square distribution has one parameter: k a positive integer that specifies the number of degrees of freedom (i.e. the number of Xis)
Additivity : From the definition of the chi-square distribution, it follows that the sum of independent chi-square variables is also chi-square distributed. Specifically, if {Xi}ni=1 are independent chi-square variables with {ki}ni=1 degrees of freedom, respectively, then Y=X1+X2+Xn is chi-square distributed with k1 + k2 + kn degrees of freedom

T Distribution in Statistics

The T-Distribution is a theoretical probability distribution. T-distribution depicts the set of observations mostly falling close to the mean, the rest of the observations making up the tails on either side. 

T distribution is symmetrical, bell-shaped, and similar to the standard normal curve. It differs from the standard normal curve in the way that it has an additional parameter, called Degrees of Freedom, which changes its shape

Degrees of Freedom : Degrees of freedom, usually symbolized by df, (which can be any real number greater than zero (0.0)), is a parameter of t distribution. Setting the value of df defines a particular member of the family of t distributions. A member of the family of t distributions with a smaller df has more area in the tails of the distribution than one with a larger df.

Effect of df on the four t distribution

Smaller the df, the flatter is the shape of the distribution, resulting in greater area in the tails of the distribution
Relationship to the Normal Curve
The T distribution looks similar to the normal curve. As the df increase, the t distribution approaches the standard normal distribution ((=0.0, (=1.0).
The standard normal curve is a special case of the t distribution when df=(. The t distribution approaches the standard normal distribution relatively quickly

F-Distribution in Statistics

Enumerated by Ronald Fisher, F Distribution is the measure of the spread or scattering of members of two observed random samples as a test of whether the samples have the same variability. 
F distribution is obtained by taking the ratio of the chi-square distributions of the samples divided by the number of their degrees of freedom

statistic: F=(u/u1)/(v/v1) has an F distribution with (u1,v1) degrees of freedom , where u and v are independently distributed chi-squared variables with u1 and v1 degrees of freedom, respectively,

From the definition of the t distribution, the square of a t statistic may be written as:
t2=(z2/1)/(v/v1), where z2, being the square of a standard normal variable, has a chi-squared distribution
Thus the square of a t variable with v1 degrees of freedom is an F variable with (1,v1) degrees of freedom, that is: t2=F(1,v1)

Click here to see PDF