# Capturing Variability

Last week I introduced the five most critical topics I felt a professor of applied statistics could impart upon her students https://statisticalsage.wordpress.com/2012/06/10/the-most-critical-concepts-in-applied-statistics-treating-students-like-family/.

1. Making Sense of Variability
2. Capturing Variability
3. Normal Distribution
4. Sampling Distribution of the Means and Standard Error
5. Understanding Hypothesis Testing

Earlier this week, I discussed  how to get student prepared to understand that statistics is really just about making sense of variability, and statistics becomes one tool for which to provide us knowledge.  Yet, it isn’t enough for us just to help students understand that statistics helps us to make sense of validity. They have to see how that happens. Now many (e.g., Ramsey, 2002 http://www.amstat.org/publications/jse/v10n3/rumsey2.html) have stated that simply having students complete calculations is not enough to help them to master the material, especially when using computational formulas … several textbooks in applied statistics have taken to just presenting definitional formulas, and some point out the benefit of definitional formulas over computational formulas (e.g., Katarina Guttmannova, Alan Shields, and John Caruso (2005) — for a review of their work, see this prior blog, https://statisticalsage.wordpress.com/2010/09/07/concepts-or-computations/).

I, too, fall squarely in this camp … it isn’t enough to talk to students about what statistics can do, I want them to be able to read formulas like they can read a sentence, and I start with helping them to understanding how we capture variability. I start with helping them to understand the deviation, that is the observation (X) minus the sample mean (Xbar). Remember, students aren’t used to math functions as having names … so help them to understand that X-Xbar and Deviation are just two ways of saying the same thing.  The deviation is the basis of many calculations in statistics … how far is a specific observation from the mean? What is the average of that difference?

Obviously, before students could be introduced to the variance or standard deviation, they must first understand the deviation, conceptually, then computationally. To do this, I implement a class activity that gets students to understand how deviations works. I start by talking to students about the concept of variability, that is how spread out the data are. I have students who live on campus stand up, so we can see the differences between students living on and off campus (typically the students left sitting are commuter students), then the standing students pair up with someone sitting. They are to get a measure, in miles of how “spread out they are” in pairs. We take that number and find the average “spread outedness.” Then we do it again, this time people who live off campus stand representing the county in which they live. They pair up with someone for the same county and all on-campus students pair with each other. Again, they obtain their distance from each other in miles and we find the average “spread outedness”. Always, for my students, this number is smaller, so I conclude they live close together now. The students laugh, and we discuss how to better do this…the students conclude we need a stable, common location for everyone, and they typically select a location on campus. We repeat the task one more time, this time, having every student reporting how “deviated” or far from the location on campus they live. By having them physically move and pair up, during the first two exercises they are better able to comprehend the concept of deviation, that is that the individual observation must be “paired up with” something to get the deviation. However, since the pairings can change, thus changing their deviation, they recognize that we need a “centralized” mean to obtain a consistent deviation. Once we find everyone’s deviation, the natural progression is to find the average deviation.

I write out the symbol for the mean and for the deviation, and I ask students to create the formula for the average deviation … let the symbols speak to what you are doing. Then, they plot out a plan (sum the deviation and divide N), but, of course, this doesn’t work, as the sum of the deviation always equals zero … this discovery set of activities guide students to squaring the deviations before finding the average of the squared deviations … that is the variance.

This activity results in about 80% of the students creating the formula for the variance on their own … I don’t give it to them, they create it (of course, not everyone sees it, and that’s OK, too). It doesn’t take long, with proper questioning to get students to realize, that when we squared the deviation, we also squared the units … and squared units make  no sense.  Thus, we simply have to take the square root of the variance in order for the units to make sense and to get us back to an average deviation (instead of an average squared deviation)… thus, students “discover” the standard deviation.

Let me tell you … anything in statistics that you can create yourself becomes much less threatening than throwing a formula for the standard deviation up on the chalk board.

For those of you who have made use of discovery learning and the Socratic method of questioning, this should be pretty easy to implement. For those of you who haven’t … it’s worth trying. It’s not going to work well every time, and just like learning a new way to control a soccer ball, it takes practice before it becomes natural.

What the standard deviation is communicating to students has become more clear … it’s the average deviation of each of the scores from the sample mean … thus, it’s a measure of “spreadoutedness” of the data, just like the activity in class.

We can now get into the differences between variances and standard deviations to capture the population (or sample as a descriptive statistic) vs. the variance and standard deviation used to infer the population parameter from the sample statistic. This concept is very nicely illustrated in Chapter 5 of Kiess and Green’s (2010) Statistical Concept for the Behavioral Sciences 4/e, and I encourage you all to read over it.

So, though students are calculating deviations, sum of squares, variances and standard deviations, they are clearly the easiest and shortest part of this lesson, and I believe (though I have no empirical evidence) that conducting hand calculations help students to master the material, conceptually … as they get to see the deviations, over and over again.

Don’t forget to assign homework! My homework comes from Green and Sandry’s (2010) Exercises and Assignments for Students. It’s available for free for faculty at http://www.pearsonhighered.com/educator/product/Statistical-Concepts-for-the-Behavioral-Sciences/9780205626243.page#tabbed) . The assignments are in  Chpt 4 and 5.  Two are on the Sum of Squares, one on finding the population variance and standard deviation, and one on finding the sample (used to infer the population) variance and standard deviation.

Just to review, to help students to Capture Variability I:

1. Conduct the Deviation Activity, thrice — students pair with each other to maximize deviation (guided by professor); students pair with each other to minimize deviation (guided by professor); students provide individual distance from an agreed upon centralized location.
2. Find the average deviation.
3. Brainstorm methods for “fixing” the zero.
4. Find the Sum of the Squared Deviation (SS)
5. Find the average SS (variance)
6. Identify how to make this measure more conceptually meaningful (professors, focus on the problem with interpreting squared units)
7. Find population variance and standard deviation.
8. Conceptually identify the idea of inferring variability of a population from a sample.
9. Show and practice how to calculate the inferential variance and standard deviation.
10. Assign Homework.

Time well spent in this area, results in much less time being spent in future chapters on z-score, t-score, correlation and regression.

As always, I am always on the lookout for other examples.

Bonnie