Google Classroom
GeoGebraGeoGebra Classroom

Sample variance

Set up We have a random variable with unknown mean and variance . We have taken a sample from the distribution: . The sample has mean . We want to estimate , using the sample. Question Why do we use as an estimate for , instead of ? i.e. why divide by instead of by ? Dividing by is what we usually do when calculating variance... A suggestion... Let and let . Instead of thinking that , think about it as (an equivalent statement). Why? Play with the app below to get a feel for what it is doing. Click "sample" to take different samples. Then read on...
We want an estimate for the population variance: . This relates to the green lines: how the vary around . Let be our estimate for . If we knew , we could use (the mean of the squares of the green lines). But we don't know . We have (the mean of the squares of the blue lines). This is something we can calculate: we have values for and for each . This is the sample variance and does give a measure of how the vary around . We also know that if the mean of our sample is treated as a random variable, , then its variance is given by the expression: (the "..." is left as an exercise). So if we're estimating as , then an estimate for is . This relates to the purple line: how varies around . Putting it together... The green arrows are equivalent to following the purple arrow and then the blue arrows: how the vary around depends on how varies around , and then how the vary around . So roughly, . (... the maths checks out on this; again, an exercise, with a starting point given below*.) Solving this equation for gives: . i.e. . *The detail To consider algebraically why should satisfy , start here: aiming at: . Recall that, as each are sampled independently, for .