# Populations vs. Samples in Statistics

Hello All,

So, my first love is teaching applied statistics. It has been and probably always will be my favorite thing to do at work. However, for the past two years I served as an interim administrator. So for my next few posts, I am discussing what information needs to be understood by administrators, as many of you very well may be called upon to lead professional development workshops or work, one on one, with an administrator.

If you look down my list that I posted previously regarding what information should be included in a training session with administrators you will see that the first four posts were more broadly based, dealing with issues technically outside of applied statistics but that either provide the foundations for good decision making with data or that are there to help administrators understand why we need to use statistics to make good decisions.

1. Epistemology, Decision Making, and Statistics
2. Cognitive Biases; How Statistics can be used to get to the Truth
3. Detecting Data Integrity Issues
4. Data Management Protocol
5. Populations vs. Samples
6. Observational Errors: Measurement, Experimental, and Sampling
7. Quality Decisions are Limited by the Quality of Measures
8. Sampling and Quality Decisions
9. Statistics and Sampling Error
10. Parameters and Mathematical Modeling vs. Inferential Statistics (Introduction)
11. Mathematical Modeling, Parameters, and Assumptions
12. Statistical Decision Errors: Type I and Type II

However, now we are up to #5, which is simply two terms that can be found at the beginning of every statistics book: Population vs. Sample.

A population is an entire group of something sharing a common characteristic or characteristics. For example, in a memo that I recently received, the GPA for all students from our institution was listed. This number was compared to the GPA for all student athletes from our institution. All students from an institution is one example of a population. All student athletes is another example of a different population. In each of these cases, the GPA is NOT a STATISTIC.

A statistic is a number that capture what is going on with a sample, that is a subset of a population.

A Parameter is a number that captures what is going on with a population, that is the entire set of something or someone sharing a common characteristic, like being student athletes at a particular institution.

Now before I can explain why this matters, we will have to go through a few most posts for background information. Until then, let’s just say … we don’t use statistics to understand populations, we use parameters.

And we can’t treat a sample like a population, it is a subset, which means it is probably going to be less varied than the total population and there very well may be differences in the people in the sample when compared to the overall population. Take, for example, when  a professor holds an optional study session for the exam. Think about the students who would show up for such a session. How might they differ from the students in the class who didn’t show up?

• Maybe the students who didn’t show up have to work or take care of a family member, so they didn’t have the time.
• Maybe the students who showed up are  really motivated to master the material while the ones who didn’t show up are satisfied to just do well enough.
• The theories of Carol Dweck would predict students with Incremental Views of Intelligence (that belief that with effort they can get smarter) would be more likely to participate than those with Entity Views of Intelligence (those who believe we are either born smart or not).

There are any number of reasons why students in these two groups may differ, but if the goal is the  estimate how well the entire class is prepared by using the sample of those who show up for study group, there is going to be error that will keep you from seeing everything.

Population parameters don’t have that kind of error, because everyone is in the group.

Understanding these two terms is requisite for understanding what is to follows. Certainly professors  teaching applied statistics know and understand this, but often administrators do not, so if you are called to train them or work with them in areas of assessment or the use of data to make decisions, make sure they understand this distinction.

Till next time ….

Bonnie