Average & sum of squares, var, and stdev
- Walter M. Stroup
Mean and Sum of Squares
This illustrates minimizing the sum of the areas of the squares formed by the how far away from the point G (green X) is from each x-value. By moving G left and right a trace graph results. The trace graph shows the value of the sum of the areas of the squares (the graph is centered at the absolute value of x-coordinate of G, just to make it easier to see). Note, when the trace graph is at a minimum the sum of the areas is at a minimum. This happens when the x-coordinate of G is the same as the mean (average) of the x-values of the points A, B, C, D, E, & F. The mean can be thought of as the value where the sum of the squares (ssq) is at a minimum. var = ssq / N (the number of points). As G is moved, the value for var changes. When the x-coordinate of G is at the mean, var is the VARIANCE for these points. sd = sqr (var) [square root of the var]. When the x-coordinate of G is at the mean, then var is VARIANCE and sd is the STANDARD DEVIATION for the x-values of A, B, C, D, E, & F. Things to try: Move (some of) the points A, B, C, D, E & F around. Then move G. You may have to zoom out to see the graph of the sum of the areas of the squares. What changes? What stays the same? At what value is the graph of the sum of the squares a minimum? Is it the same value as before? If not, how is the value related to what was illustrated earlier? What interpretations do var and sd now have relative to the new locations of the points? [Note, you can refresh this page to start over]. What does least squares regression do? https://www.geogebra.org/m/B7JtA6Mg  Try minimizing the Sum of (the areas of) Squares by moving F and G  Turn on ”Least Squares” regression line (LSRL). [related to ordinary least squares]  Move F and G to align with LSRL and note how Sum of Squares is less (a minimum) >> Regression minimizes the sum of the areas of the squares formed by the distances to a line.