## Teaching hypothesis, design, analysis & inference as one thing

“So, the question I would like to pose for the sages and anyone else interested in commenting … for a first semester undergraduate applied statistics class … what are the most critical student learning outcomes that have to be mastered?”

First let me just comment on the blog Bonnie just posted today: I think her list of core concepts is excellent, and I agree that those concepts (all of which have to do with the ever-present error inherent in all our observations and measurements) should certainly be taught in the introductory statistics course.  Nevertheless, let me introduce a different perspective.

When I first saw the question posed by Bonnie  (reproduced above) I thought the answer would be an easy one to write. It turns out it is not quite so easy. My problem is that I see all the parts of the application of statistics as parts of an integrated whole. So my answer will appear to be a daunting one.

I hope that students can take away an appreciation (mastery would be too much to ask at this level) of how we use data to make inferences about the behavior under study. My typical homework problems were not, except for some initial ones, about calculations: finding means, t or F values, and p values. Rather an experimenter’s hypotheses would be stated along with how she collected the data to test those hypotheses, and (relatively simple) data would then be listed. The question posed was: what can you conclude from these data, and especially what can you conclude about the hypotheses? The appropriate way to answer such problems was to present the means and to interpret what the pattern tells us, with the statistical test of significance to guide us as to which differences we could assume due to the independent variable.

I understand that this is asking a lot of the students, but just getting statistics from data sets bores the heck out of me, and I don’t see why it would not be equally boring to the students. A few weeks into the semester we would be into the Analysis of Variance (Keppel’s book does a wonderful job facilitating early introduction of AOV), and the course especially emphasized factorial designs in which interpretation of patterns of means with the assistance of significance testing becomes, for me at least, most challenging and most interesting. The logic of the interplay of hypothesis, design, data, statistical analysis and inference is to me all one thing.

Such an integrated concept, satisfying to me, may or may not be an asset when applied to teaching the first undergraduate course in statistics.

## Data without error isn’t

Just a few thoughts after reading Schmidt’s Detecting and Correcting the Lies That Data Tell (2010) (see Bonnie’s October 14 post for link).  In it Schmidt argues, presenting clarifying examples, that accurate interpretation of collected data suffers from “ … researchers’ continued reliance on the use of statistical significance testing in data analysis and interpretation and the failure to correct for the distorting effects of sampling error, measurement error, and other artifacts.” [from the Abstract]  Schmidt suggests the use of meta-analysis but improved by the estimation of and the elimination of the “distorting effects.”

Schmidt’s applications to meta-analysis are elegant, with a beauty (to me) similar to that of structural equation modeling, in both of which distinctions are made between and independent estimations are made of the constructs of interest and the errors necessarily attached to our measurements.  This is valuable work providing a powerful tool for theory testing.  But it also makes me uneasy.  As we all know, sampling and measurement errors are always present in collected data.  So when we get an estimate of an effect size after stripping away the intrinsic error, what does it mean?  Schmidt presents as “the truth” what the results would look like if the data were different from what the data really are.  I am reminded of what an editor once wrote to my co-author and me, criticizing an analysis we had done on transformed scores.  He said he wanted to see that the subjects did, not what the experimenters did.  We thought he had a point.

Schmidt also argues against the use statistical significance testing, citing a number of ways it has led to misinterpretations.  I agree with him about the misinterpretations, and I agree with Bonnie (see her recent blog here) about what to do about those misguided uses of a significance test – don’t do that!  But I do not agree that therefore significance testing should be abandoned.  Meta-analysis is not necessarily appropriate for all research questions and studies.  For a stand-alone study in which a researcher claims her independent variable has shown an effect, it is not unreasonable to ask for some evidence that the obtained difference is unlikely to have resulted by chance (i.e., from the effects of those pesky sampling and measurement errors).  Good experimental design attempts to establish a cause-and-effect conclusion by eliminating all other “rival hypotheses.”  The statistical significance test simply assesses the likelihood of the rival hypothesis of “chance.”

Filed under Statistical Hypothesis Testing