# Category Archives: Hypothesis Testing

## THE most critical concepts in applied statistics: Treating students like family

There is nothing like having a child preparing to learn statistics that really gets a mother to focus on … what are THE most critical concepts in applied statistics. I’ll be honest; I’m not basing this posting off of research, as sadly, no such research exists. It is, instead, based off of my experience in teaching and research coupled with the reality, I only have a few hours to cover the most important material to my son and sons and daughters of a few of my dearest friends. You see, they are all preparing to take a math statistics class either this summer or this fall. We all want our children to understand math stats in the larger concept of applied statistics.

In this posting, I will cover the outlines of what I have deemed most critical, then over the course of the next few weeks, I will detail the lessons, activities, and homework assignments.  Each session is equivalent to one weeks’ worth of work during a typical semester for the type of students I teach. As with everything … there may be some variability in how much time it takes to cover this material depending on your class size and student type.

#1: Making Sense of Variability

• Introduction to Epistemology — the four ways of knowing, with a focus on the dance between rationalism (forming hypotheses) and empiricism (gathering observations in the form of data).
• 4 Uses of Statistics: Describe, Infer, Test Hypotheses, Find Associations
• Introduction to research methods (just the experiment, and appropriate terms).
• Brief review of mean, median, and mode

Session #2: Capturing Variability

• Conceptually understanding variability (deviation) and the sum of squares
• Finding the Sum of Squares
• Obtaining the average Sum of Squares — the variance
• Understanding why we need the standard deviation (as it makes conceptual sense, where the variance doesn’t)
• Population Variance and Standard Deviation and Sample Variances and Standard Deviations used to infer the population

Session #3: Normal Distribution

• Review population vs. sample/ parameter vs. statistic
• Normal Distribution as a type of a population
• Properties of the Normal Distribution
• Area under the curve of a normal distribution
• Z-scores as a means of identifying location of an observation on the normal distribution

Session #4: Sampling Distribution of the Means and Standard Error

• Conceptually understanding a sampling distribution
• Exploring the variability in sample mean and understanding why
• Sampling Distribution and the Central Limit Theorem
• Standard Error of the Mean (actual and estimated)
• Introduction to the z-test as a means of finding the location of a sample mean on the sampling distribution of the means
• Comparing and Contrasting the Normal Distribution with the Sampling Distribution of the Means

Session #5: Understanding Hypothesis Testing

• Statistical Hypotheses
• Decisions/ Assumptions/ and Consequences (outside of statistics: common examples, selecting a college & deciding to go on a date).
• Steps of Hypothesis Testing: Research Hypothesis; Statistical Hypothesis; Creation of Sampling Distribution of the Means, and identification of rejection region; Gather Data/Calculate Statistic; Make a decision from data; Draw a Conclusion from data
• Errors in Statistical Decision Making

Now, by understanding all of these concepts, I believe my son and my friends’ children will be prepared to learn any calculation in statistic and better understand what is happening, and how they can interpret the results.

My hope for their classes is that the profession teaching the mathematical statistics class informs the students: Where in the formula the sampling error is calculated or estimated; the times when the statistic can and cannot be used; the assumptions underlying the statistic and what happens to the results when they are violated. I would like my son and my friends’ children to learn about basic parametric and nonparametric statistics, and a little about statistical computing.

Over the next few weeks, I will lay out detailed activities and homework assignments that align with these critical concepts.

Please let me know if you feel I missed a critical component or overstated a concept that you feel isn’t as critical.

## Difficult Concepts: Research Hypotheses vs. Statistical Hypotheses

I always cringe when I see a statement in a text or website such as “the research hypothesis, symbolized as H1 , states a relationship between variables.” No! No! No! How can students not be confused on the difference between research and statistical hypotheses when instructors are? H1 is not the research hypothesis, it is the alternative to the null hypothesis in a statistical test.

Let’s be very clear, in most research settings, there are two very distinct types of hypotheses: the Research or Experimental Hypothesis, and the Statistical Hypotheses. A research hypothesis is a statement of an expected or predicted relationship between two or more variables. It’s what the experimenter believes will happen in her research study. For example a researcher may hypothesize that prolonged exposure to loud noise will increase systolic blood pressure. In this instance the researcher predicts that exposure to prolonged noise (the independent variable) will increase systolic blood pressure (the dependent variable). This hypothesis sets the stage to design a study to collect empirical data to test its truth or falsity. From this research hypothesis we can imagine the scientist will, in some fashion, manipulate the amount of noise a person is exposed to and then take a measure of blood pressure. The choice of statistical test will depend upon the research design used, a very simple design may require only a t test, a more complex factorial design may require an analysis of variance, or if the design is correlational, a correlation coefficient may be used. Each of these statistical tests will possess different null and alternative hypotheses.

Regardless of the statistical test used, however, the test itself will not have a clue (if I am allowed to be anthropomorphic here) of where the measurement of the dependent variable came from or what it means. More years ago than I care to remember, C. Alan Boneau made this point very succinctly in an article in the American Psychologist (1961, 16, p.261): “The statistical test cares not whether a Social Desirability scale measures social desirability, or number of trials to extinction is an indicator of habit strength….Given unending piles of numbers from which to draw small samples, the t test and the F test will methodically decide for us whether the means of the piles are different.”

Rejecting a null hypothesis and accepting an alternative does not necessarily provide support for the research hypothesis that was tested. For example, a psychologist may predict an interaction of  her variables and find that she rejects the null hypothesis for the interaction in an analysis of variance. But the alternative hypothesis for interaction in an ANOVA simply indicates that an interaction occurred, and there are many ways for such an interaction to occur. The observed interaction may not be the interaction that was predicted in the research hypothesis.

So please, make life simpler and more understandable for your students. Don’t call a statistical alternative hypothesis a research hypotheses. It is not. Your students will appreciate you making the distinction.

## Difficult Concepts—Degrees of Freedom

Several posts ago, Bonnie said we would address some difficult concepts for student understanding of statistics. I thought I would take a shot at one of the concepts she listed, degrees of freedom (df).

To help understand this concept, let us first think of df in a non-statistical way and say that df refers to the ability to make independent choices, or take independent actions, in a situation. Consider a situation similar to one suggested by Suppose you have three tasks you wish to accomplish, for example that you want to go shopping, plan a vacation, and workout at the gym. Assume that each task will take about an hour and that you may do all on one day, or only one each day over the course of several days. I have created a situation with three degrees of freedom, you have three independent decisions to make. Suppose you decide you will go shopping today. Does this decision put any limitations on when you may do the other tasks? No, for you may still do the other tasks either today, or in the course of the next few days. Suppose next you decide to plan a vacation and you will do that that tomorrow. Does this decision place any limitation on when you may go to the gym? Again, no, because you still might go to the gym today, tomorrow, or on another day. Notice here, that each choice of when to do an activity is independent of each of the other choices. Thus, you have 3 degrees of freedom of choice in the order of doing the tasks.

Now, set a different scenario where I plan some limitation on the order in which you may do the tasks. You still have the same three tasks to do, except now you decide you will do only one a day and you want to have them all completed over a span of three days. This scenario has only 2 df, for there are only two independent decisions for you to make. After you have made a choice on two of the activities, the day for doing the third activity is “fixed” or decided by your other two choices. For example, suppose you decide to plan your vacation today. For this choice you have total freedom to make a decision for any of the three days. You next decide to plan when to go to the gym. Notice for this decision, however, you have only two choices left, either tomorrow or the following day. A statistician would say you have two degrees of freedom when making this decision. You decide to go the day after tomorrow. Finally, you have to plan shopping, but now you have essentially no choices open to you, it must be tomorrow. For this decision, you have no degrees of freedom. Thus, in a sense, you have 2 df in this scenario. You are free to make two choices, but making any two choices automatically determines your third choice.

Of course, the obvious question a student may ask is “What does all this have to do with statistics?” Let’s see. Statistically, the df are the number of scores that are free to vary when calculating a statistic, or in other words, the number of pieces of independent information available when calculating a statistic. Suppose you are told that a student took three quizzes, each worth a total of 10 points. You are asked to guess what her scores were. In this scenario, you may guess any three numbers as long as they are in the range from 0 to 10. In this example, you have 3 df, for each score is free to vary. Each score is an independent piece of information. Choosing the score for one quiz has no effect on either of the other two scores that you may choose.

But now I give you some information about the student’s performance by telling you that the total of her scores was 27. I have now created a scenario with 2 df. Suppose you guess 10 for the first score. Does choosing this score place any limitation on what you might guess for a score on the second, given that the total of the scores must be 27? No, for your choice of a second score is still free to vary from 0 to 10. You guess 9 for a second score. What about your choice of a third score? What must it be. If the total of the three scores is 27, and the first score you chose was 10, and the second 9, then your third choice must be 8 for a total of 27 to be obtained. In this instance, the third score is not free to vary if you know the total of the scores and any two of the three scores. For this example then, there are 2 df in the choice of scores. If you know the total of the three scores, then only two provide independent information, the third score becomes dependent on previous two scores. By giving you knowledge of the total of the scores I have reduced the df in the number of choices you have.

Can we now relate these two examples to the calculation of statistics? Consider that you have a sample of 10 scores and you want to calculate the mean for these scores. In order to do so, you must know all 10 scores, if you know only 9, you cannot calculate the mean. Thus if there are n scores in a sample, then for calculating the mean from this sample there are n df. Each score is free to vary, and an independent piece of information. You cannot calculate the mean unless you know all n scores. But suppose you know the mean for the scores and you want to calculate the standard deviation (s) for the scores. In these instance, there are 9 df for these scores, for if you know the mean, you need to know only 9 of the scores, the 10th score is in a sense “determined” for you by the value of the other 9 scores. So, for a set of n scores, there are – 1 df when calculating the standard deviation.

A question frequently arises when the idea of a fixed or determined score is discussed. Students may ask how can someone’s score on a test, for example, be “determined” or “fixed in value” by her other scores on tests? Students should be made to realize that during the actual data collection process all scores are free to vary and the concept of degrees of freedom does not apply. Degrees of freedom only come into play after the data have been collected and we are calculating statistics on those data.

These ideas can be expanded to the computation of other statistics. Consider analyzing data with a 2 2 chi-square test of independence. When we are collecting data for the contingency table, the concept of degrees of freedom is not applicable. After we have collected the scores, however, and each cell of the contingency table is filled, then we can use the cell totals to find the row and column marginal totals. Notice at this stage, that if I were to tell you the row and marginal totals, then I would need to give you only one cell total, and you would be able to determine the other three cell totals. In this instance, when knowing the row and marginal totals, there is only 1 df for the cell totals. In a more general sense, if there are r rows and c columns in a contingency table, then once the row and column totals are known, the table possesses (– 1) (c – 1) df.

I think giving students this intuitive overview of df helps them to understand where such numbers come from when they are learning about various statistical tests. Perhaps it may help to make statistics a little less mysterious.

## Exposing students to Diversity while teaching the t-test

There are a lot of ways we can approach diversity in the statistics classroom, as even the term “diversity” can be operationally defined in so many ways. One method is to use research on diversity as a basis of an example when teaching statistics in context.

Often the complexity of the statistics used in a published journal article are beyond what would be taught in an introductory course in applied statistics, however, what I often do is take the research hypothesis and design and simplify it a bit. Yes, this is an example of scaffolding. So, I structure the study to fit the concept I am teaching (e.g., making a multivariate research student univariate, or making a two-way factorial a one-way). Keeping the general structure of the study intact, I often shorten the task so it will take less than 10 minutes to run through the mock study, collect the data, and then provide students with critical conceptual background information. This still gives me enough time (in a 50 minute class) to have students work through the problem, while I model it, and go from question to answer through the use of hypothesis testing.

In this example, the concept I will be teaching is the independent t-test, a form of null hypothesis testing. The study I am using is Apfelbaum, Pauker, and Sommer’s (2010) study of 4th and 5th grade students which examined the effects of color-blind thinking and value-diversity thinking on bias.

In short, color-blind thinking is simply ignoring race as a variable worth attending to, as in doing so, issues of bias will be minimized. (For my social scientist readers … this is a very etic way of approaching the potential of racial bias.)

Value-diversity thinking (emic) actively recognizes differences within each racial and ethnic group.

As we are comparing two different conditions, this study can easily be adapted to an independent t-test.

So, during class, we could quickly, and randomly provide students with one of two sheets of paper.

Borrowing phrased directly from the published study, the students in the color-blind condition would see phrases like:

• We need to focus on how we are similar to our neighbors rather than how we are different.
• We want to show everyone that race is not important and that we’re all the same.

Meanwhile, the students in the value diversity condition would see phrases like:

• We need to recognize how we are different from our neighbors and appreciate those differences.
• We want to show everyone that race is important because our racial differences make each of us special.

In the actual study, Apfelbaum, Pauker, and Sommer’s (2010) looked at both implied and explicit racial biases. For the in-class activity, as this is being conducted with college students instead of 4th and 5th graders, we could just use the implied bias. Read to students the following scenario (slightly modified from the article): “Most of Brady’s classmates got invitations to his birthday party, but Terry was one of the kids who did not. Brady decided not to invite him because he knew that Terry would not be able to buy him any of the presents on his ‘wish list.’”

Then ask students to write down an answer on a scale from 1 – 10, 1 being completely inappropriate and 10 being completely appropriate. Typically, my class size is too large to collect data from all of the students, so I would randomly select 5 students from each condition, and write their responses on the chalk board. Now, we can model how to answer the question: is encouraging people to ignore race a way to increase bias or decrease it, compared to encouraging people to factor race into evaluating situations.

Of course, one of the problems in using “real data” in a study with so few subjects is that you will never be certain if the test statistic will support the same conclusion as the research article. There are two ways to deal with this problem, acknowledge the potential for low power right from the start, or have the students complete the activity, but use data that you selected to model how to answer this question with the use of an independent t-test. The latter might be best for individual’s new to teaching, as you can come to class with your calculations prepared.

In closing, it is easy to bring diversity into a classroom, even if you are that “scoop of vanilla ice cream.” One of the best ways is to make use of published research studies on cultural or racial diversity as a way of modeling critical concepts in statistics.

Apfelbaum, E. P., Pauker, K., & Sommers, S. R. (2010) In blind pursuit of racial equality? Psychological Science, 21, 1587-1592. http://pss.sagepub.com/content/21/11/1587.full

## Before the semester starts … I’m playing with pictures!

I am sure I’m not alone in wanting to use the time between semesters to make adjustments to what I am teaching or how I am teaching it. By now, you probably recognize that I am a fan of learning about new pedagogical techniques. I am dedicated to helping students to truly understanding the concepts of statistics. Often, having visuals when you teach is useful for students.

I use the chalk and a board (OK, more like 8 boards that move). I draw a lot of pictures. However, a mathematics professor (who is both a great colleague and friend) has been bugging me about using Mathematic in addition to chalk (a delivery system she also loves).

With Mathematica, it is my hope that I will not only be able to present my students with a visual image of certain concepts during class time (like how a normal distribution changes when the size of the standard deviation gets larger or smaller) but by making these demonstrations available electronically to students for them to explore these concepts on their own, I am hoping students will gain a greater conceptual understanding of critical statistical concepts.

Mathematica is a software package, that among other things, provides demonstrations of statistical concepts. Each demonstration was designed by an instructor. For it to be published, it is my understanding that it goes through a rigorous peer-review process. As such, if it’s printed for use, you know it will work. The down side is that your university would have to pay for a subscription to Mathematica for the demonstrations to be useful. http://www.wolfram.com/solutions/education/higher-education/uses-for-education.html

As I stated last week, in my list of resolutions, my goal is to find five different demonstrations this semester. Why five? It seemed like a reasonable number … not too challenging.

This was really easier than I anticipated. I started by indentifying the concepts that would most benefit from being able to visualize and manipulate variables. Then I visited the Mathematica web site and searched the topics. Each search yielded anywhere from 5 to 25 demonstrations, some were appropriate, others weren’t. I looked through the demonstrations and selected the ones I liked.

Here are the concepts and the demonstrations I identified as being potentually useful this semester.

(1) The Normal Distribution, where students get to input mu and sigma, would make a nice visual demonstration.

http://demonstrations.wolfram.com/TheNormalDistribution/

This Normal Distribution also shows the area under the curve (i.e., you can manipulate the z-score)

http://demonstrations.wolfram.com/AreaOfANormalDistribution/

(2) Another good demonstration would be the Sampling Distribution of the Means, where students can see the impact of changing mu, sigma, or sample size on its shape.

http://demonstrations.wolfram.com/SamplingDistributionOfTheSampleMean/

I’m also going to throw in a demonstration on the Central Limit Theorem, as how can we talk about the Sampling Distribution of the Means without mentioning the Central Limit Theorem?

http://demonstrations.wolfram.com/TheCentralLimitTheorem/

(3) Of course, what changes in the Sampling Distribution of the mean is the standard error, thus showing how a standard error changes due to changes in the sample size and/or variability makes a great deal of sense. I was really hoping that a demonstration on the standard error would already be available, unfortunately, it doesn’t seem to be. A similar concept is the confidence interval, though even with this demonstration the writer of the Mathematica code for this demonstration did not include how variability (i.e., standard deviation) impacts the size of the “margin of error.” However, it still could be a useful demonstration.

http://demonstrations.wolfram.com/ConfidenceIntervalsConfidenceLevelSampleSizeAndMarginOfError/

Though not as clean looking at the one above, this demonstration also includes the size of the standard deviation. http://demonstrations.wolfram.com/ConfidenceIntervalExploration/

I would expect that the two demonstrations would be necessary for student to get a richer understanding of confidence intervals.

That having been said, I believe that two new Mathematica Demonstrations are in order … one dealing with the size of the standard error based on changes in sample size and variability and a possibily a new CI demonstration that merges the best of these two demonstrations.

(4) The effects of the sample size and population variance on hypothesis testing with the t-test seems like a great visual demonstration.

(5) How changes in the variables impact correlation’s (depending on how they are calculated) should be useful for my students.

http://demonstrations.wolfram.com/CorrelationAndRegressionExplorer/

(6) Those of you who know me, are probably not surprised that I can’t just stop at 5 examples for this first semester … so here is a great demonstration on Power. Though I can get students to define power, and identify threats to power, I am never fully certain that they truly get the beauty (and hassle) of power. This demonstration may help.

http://demonstrations.wolfram.com/ThePowerOfATestConcerningTheMeanOfANormalPopulation/

Of course, without proper instruction during class time and an accompanying explanation following class instruction, these demonstrations may end up being little more than pretty pictures to students.

In a few weeks, especially after I actually try these demonstrations with my students, I will provide for you the information I attached with the demonstrations as well as feedback as to what worked and what didn’t. After all … anyone who has taught long enough knows, even the best planned lessons and demonstrations some times flop.

Though not specifically having to do with teaching statistics … I found a nice article at Chronicle of Higher Education on Iphones, Blackberries, etc … and apps that could help professors. The attendance and learning students’ names apps look promising. http://chronicle.com/article/College-20-6-Top-Smartphone/125764/

I look forward to hearing from any of you who have used Mathematica Demonstrations (or others) during class and for homework.

## How wonderful and I wish, I wish …

As I type this, I have one fifty minute class left to teach, and my time with my statistics class will be over. As with anything, each semester is varied. Some semesters I cover more information than other semesters. I liken this semester to driving through the city and hitting all green lights! As such, I believe my students were able to master additional information based on what is probably mostly good fortune.

So, here is my list of things I’m so thrilled I covered:

(1) Effect size statistics, like eta squared: Sure effect size statistics are not used that much, and lets face it, they are super easy to calculate, but my biggest reason for wanting to teach effect size statistics is it helps students to understand what a t-test or F-test can tell us (is there a difference) and what it can’t tell (how big is the effect). In fact, by spending about 20 minutes on the teaching of effect size statistics, students were better able to understand why the “p-value” for an observed t or F score provides us with no information. All we need to know is, did we pass the threshold.

(2) We find the critical value BEFORE calculating the observed value: This discussion helps focus student on the logic of statistical hypothesis testing. Specifically, statistical hypothesis testing works because we assume that the null hypothesis is true, that there is no effect of the independent variable on the dependent variable. With this assumption, we are able to generate the sampling distribution that provides us with information on the standard error. Now, if our sample mean is too extreme, we reject our initial hypothesis, the null, and accept the alternative hypothesis, that is the means are different. By finding the critical value prior to calculating the statistic, it helps focus students on that “line in the sand” to say … my observations are too extreme for me to stay with my current hypothesis. Students are far less likely to fall victim to equating p-value with the strength of the effect of the independent variable, or to conclude … the data is trending because I have a p-value of .07 or some other funky thing far too many people do with null hypothesis testing. By spending a bit more time on the steps involved in hypothesis testing, I think students are less likely to fall victim to the common misconceptions surrounding Statistical Null Hypothesis Testing.

(3) Though not a specific concept, I am pleased that for almost every concept I taught this semester I used new examples. Sure, I’m still a sage in training, no grey hair and all, but I was beginning to find myself using the same examples. As this is the third semester my supplement instructor, Amy, is taking notes in class, I felt I owed it to her, at least, to “keep it fresh.” I also found thinking about this blog helped spur my mind toward different examples. In doing so, I found some worked even better than my “old stand by” examples, but the great things was, when the new example flopped, I just quickly switched to the example I knew helped students.

Now for my Wish List of things I always wished I could have covered, but didn’t.

(1) Though I do get to cover the concepts of the F-test. I teach a three credit class, and only have time to cover the one-factor between subject ANOVA. If only I could cover a two-factor between subject ANOVA and a one-factor within subject ANOVA, I would feel my students would really understand the F-test (and as such, be less incline to misuse or over use it).

(2) Yet, I feel if I could cover non-parametrics, students would better understand the role of the assumptions in parametric tests, and issues like Power and random error could be even better understood. Plus they would get the benefit of learning about a really important class of statistics. Sadly, another semester has passed without me being able to cover this topic with the depth I think it deserves.

(3) I fear I don’t emphasize the weakness of statistics, and that they are only as good as the quality of the theories being tested in the design. They are also only as good as the quality of the sample and the quality of the measure. At least the latter two concepts get covered in classes that will follow the statistics class. But so few people speak of the topic of equifinity, that the same outcome can have multiple explanations. Again, though I touch on this, the idea of developing the alternative rival hypotheses that could explain the same empirical evidence is one I simply don’t have time to cover to the extent I would like. If you have a weak theory or haven’t taken into account the alternative rival hypotheses when designing your study, cool statistics will not improve the quality of your findings.

(4) Though I tell students the hypothesis drive everything, from the selection of the measure and research design, to the specific statistic one would select, and though there are example problems in the textbook (Integrating Your Knowledge) that students have to complete, I really wish we could spend more time on this.

Maybe next semester, I can find a way to reach my wish list … maybe!

## Core Statistical Concepts

I have been spending the week thinking about what I consider to be the “core concepts” that need to be covered in an applied statistics class, be it in psychology, health, business, or education. However, before I post my personal thoughts, I felt it necessary to see what other applied statisticians had to say. In my search, I found http://www.statlit.org/pdf/2004McKenzieASA.pdf . This work was conducted by John McKenzie (2004), Conveying the Core Concepts, is from the Proceedings of the ASA Section on Statistical Education, pages 2755-2757.

In reading what  McKenzie, and several other professors of applied statistics identified as the core concepts in statistics, I must say … I concur. Listed below are the core concepts in applied statistics … the information that, in my opinion, simply has to be covered regardless of illness, snow days, or anything else that could interrupt a professors’ teaching schedule.

Variability: Students cannot understand the purpose of statistics unless they get the concept of variability. Within this, we can further talk about variability due to chance and variability due to effect. Including in the discussion of variability should be the difference between systematic and random variability. I would have to say that not a class period goes by without me spending at least a little time on helping students to focus on issues of variability (especially variability due to the individual differences of the subjects who just happen to be in our sample).

Randomness: Though I would see randomness and variability as being part of the same large concept, McKenzie’s work identified the concept of randomness as not only separate from variability but also critical for students to master.

Sampling Distribution: Along with Hypothesis Testing, the teaching of sampling distribution is considered to be one of the most complicated to teach.  I would concur, which is why I spend an entire class period just on a single activity with M&M’s to demonstrate the concept of sampling distribution. (Please see a prior blog entry for details on this tactile activity).

Hypothesis Testing: The sages and I spent the month of October and much of November discussing whether Hypothesis Testing is critical and if so, how to best tackle the teaching of this complex topic. Not surprising, McKenzie identified the teaching of hypothesis testing as being one of the two most difficult concepts to teach in applied statistics (the other being sampling distribution). Though there may be several published articles on hypothesis testing no longer being a critical concept to teach, the individuals who were surveyed for McKenzie’s work, certainly consider it to be a critical concepts.

Data Collection Methods: Though I have said to my students more times that I can count, “the quality of our statistics is limited by the quality of our sample,” I must admit to being a bit surprised that this was considered critical by others, especially since when I look at many undergraduate statistics textbooks, data collection methods are barely mentioned. Kiess and Green’s (2010) Statistical Concept for the Behavioral Sciences, 4/e, certainly tackles the issue of data collection methods.

Association vs. Causality: This core concept makes me smile, as often when I meet someone for the first time, and they ask me what I do … my response is often met with one of two comments … “Oh, I hated statistics” or “Correlation does not mean causation.” It’s kind of like me recalling how to greet a person in German, a class that I had for three years, and yet recall so little. We, as applied statisticians, certainly engrave this concept into the minds of our students, but I’m sure most of you are like me, hoping student get more than a “pat phrase” out of our classes.

Significance (Statistical vs. Practical): This is a critical concept in applied statistics and one that is probably not mentioned in theoretical statistics classes. Sure, we delineate a mark in which we have to say … these results are too extreme for us to attribute them to “chance” … but just because we found a statistically significant difference, doesn’t mean it’s a difference that truly matters. In applied statistics, it’s not enough to understand how statistical significance works, but to be able to interpret the results to determine practical difference. I must admit to not covering this core concept to the same extent I cover the others.

As I think of other “critical concepts” they tend to be a bit more specific and fall under the larger concepts listed above (e.g., understanding what a standard deviation can tell us, clearly falls under the concept of variability. I invite all of you, to comment on what concepts, if any, are missing from this list.