0

How are sampling distributions generated using the empirical sampling approach?

Updated: 11/1/2022

Wiki User

11y ago

Depending on the number of samples involved, you either simulate all possible samples or you simulate taking a large number of samples. The distribution of he sampling statistic can be calculated from these.

Here's a Python script that demonstrates how to do this in a simple form. Suppose you want to experiment with the sampling distribution of the sample mean for samples of size 5 drawn from a normal population with mean 0 and standard deviation 1. There are, of course, mathematical results that establish the exact sample distribution of this statistic. Let's pretend we don't know that. Usually it's necessary to generate many, many samples to establish a sampling distribution. For the purposes of this exercise this code generates only 20.

from scipy.stats import norm

N = norm ( 0., 1. ) ## create source of N(0,1) deviates

x_bars = [ ] ## create place to hold sample x_bar values

for s in range ( 20 ) : ## run experiment 20 times

x_bars . append ( sum ( N . rvs ( 5 ) ) / 5. ) ## make & store sample x_bar values

x_bars . sort ( ) ## form the so-called order statistics for the sample

for i, x_bar in enumerate ( x_bars ) : ## print empirical distribution function

print '%.2f %.2f' % ( ( i + 1 ) / 20., x_bar, )

When I ran this code I got the following:

0.05 -1.65

0.10 -0.28

0.15 -0.25

0.20 -0.10

0.25 -0.09

0.30 -0.08

0.35 0.01

0.40 0.05

0.45 0.10

0.50 0.12

0.55 0.13

0.60 0.19

0.65 0.41

0.70 0.46

0.75 0.50

0.80 0.55

0.85 0.67

0.90 0.79

0.95 0.94

1.00 1.05

If you plot these using the column on the left as the x-axis you will get the so-called empirical distribution function. The sample values can also be used in a variety of ways to obtain estimates of the sampling probability density.