WEBVTT
00:00:00.880 --> 00:00:08.840
For the normal distribution shown, approximately what percent of data points lie in the shaded region?
00:00:10.120 --> 00:00:15.920
Remember, the total area under the curve representing a normal distribution is 100 percent.
00:00:17.120 --> 00:00:28.080
For a data distribution to be normal, we say that roughly 68 percent of the data set lies within one standard deviation of the mean.
00:00:29.120 --> 00:00:30.320
That’s this group.
00:00:31.080 --> 00:00:36.920
About 95 percent are within two standard deviations of the mean.
00:00:37.560 --> 00:00:38.800
That’s this group.
00:00:39.600 --> 00:00:44.920
We tend to say that any data outside of this group is an outlier.
00:00:45.480 --> 00:01:01.320
So anything outside of two standard deviations of the mean, that’s the mean minus two standard deviations and the mean plus two standard deviations, is an outlier.
00:01:02.200 --> 00:01:13.600
Now roughly 99.7 percent of the data set lies within three standard deviations of the mean.
00:01:14.360 --> 00:01:17.160
That’s this shaded area.
00:01:18.480 --> 00:01:23.520
Remember, we said that the total area under the curve is 100 hundred percent.
00:01:24.120 --> 00:01:37.280
So to calculate the percentage of the data set that lies outside three standard deviations of the mean, we’ll subtract 99.7 percent from 100.
00:01:38.120 --> 00:01:46.040
100 minus 99.7 is 0.3.
00:01:46.520 --> 00:01:57.240
So 0.3 percent of the data set lies outside three standard deviations of the mean.
00:01:58.080 --> 00:02:06.600
Since we’re only interested in half of this data, we’ll divide 0.3 by two.
00:02:07.400 --> 00:02:15.680
0.3 divided by two is 0.15.
00:02:16.160 --> 00:02:25.520
So 0.15 percent of data points lie in the shaded region.