Statistics Kolmogorov Smirnov Test – lesscss

By | February 2, 2019

Statistics – Kolmogorov Smirnov Test

This test is used in situations where a comparison has to be made
between an observed sample distribution and theoretical
distribution.

K-S One Sample Test

This test is used as a test of goodness of fit and is ideal when
the size of the sample is small. It compares the cumulative
distribution function for a variable with a specified
distribution. The null hypothesis assumes no difference between
the observed and theoretical distribution and the value of test
statistic ‘D’ is calculated as:

Formula

$D = Maximum |F_o(X)-F_r(X)|$

Where −

  • ${F_o(X)}$ = Observed cumulative frequency distribution of a
    random sample of n observations.

  • and ${F_o(X) = frac{k}{n}}$ = (No.of observations ≤
    X)/(Total no.of observations).

  • ${F_r(X)}$ = The theoretical frequency distribution.

The critical value of ${D}$ is found from the K-S table values
for one sample test.

Acceptance Criteria: If calculated value is less than
critical value accept null hypothesis.

Rejection Criteria: If calculated value is greater than
table value reject null hypothesis.

Example

Problem Statement:

In a study done from various streams of a college 60 students,
with equal number of students drawn from each stream, are we
interviewed and their intention to join the Drama Club of college
was noted.

  B.Sc. B.A. B.Com M.A. M.Com
No. in each class 5 9 11 16 19

It was expected that 12 students from each class would join the
Drama Club. Using the K-S test to find if there is any difference
among student classes with regard to their intention of joining
the Drama Club.

Solution:

${H_o}$: There is no difference among students of different
streams with respect to their intention of joining the drama
club.

We develop the cumulative frequencies for observed and
theoretical distributions.

Streams No. of students interested in joining ${F_O(X)}$ ${F_T(X)}$ ${|F_O(X)-F_T(X)|}$
  Observed
(O)
Theoretical
(T)
     
B.Sc. 5 12 5/60 12/60 7/60
B.A. 9 12 14/60 24/60 10/60
B.COM. 11 12 25/60 36/60 11/60
M.A. 16 12 41/60 48/60 7/60
M.COM. 19 12 60/40 60/60 60/60
Total n=60        

Test statistic ${|D|}$ is calculated as:

$D = Maximum {|F_0 (X)-F_T (X)|} \[7pt] , = frac{11}{60}
\[7pt] , = 0.183$

The table value of D at 5% significance level is given by

${D_0.05 = frac{1.36}{sqrt{n}}} \[7pt] , =
frac{1.36}{sqrt{60}} \[7pt] , = 0.175$

Since the calculated value is greater than the critical value,
hence we reject the null hypothesis and conclude that there is a
difference among students of different streams in their intention
of joining the Club.

K-S Two Sample Test

When instead of one, there are two independent samples then K-S
two sample test can be used to test the agreement between two
cumulative distributions. The null hypothesis states that there
is no difference between the two distributions. The D-statistic
is calculated in the same manner as the K-S One Sample Test.

Formula

${D = Maximum |{F_n}_1(X)-{F_n}_2(X)|}$

Where −

  • ${n_1}$ = Observations from first sample.

  • ${n_2}$ = Observations from second sample.

It has been seen that when the cumulative distributions show
large maximum deviation ${|D|}$ it is indicating towards a
difference between the two sample distributions.

The critical value of D for samples where ${n_1 = n_2}$ and is ≤
40, the K-S table for two sample case is used. When ${n_1}$
and/or ${n_2}$ > 40 then the K-S table for large samples of
two sample test should be used. The null hypothesis is accepted
if the calculated value is less than the table value and
vice-versa.

Thus use of any of these nonparametric tests helps a researcher
to test the significance of his results when the characteristics
of the target population are unknown or no assumptions had been
made about them.

‘; (vitag.displayInit = window.vitag.displayInit ||
[]).push(function () { viAPItag.display(ad_id); }); }())