Category Archives: Applied Statistics

Great Resource for the Teaching of Applied Statistics

Hello All,

The Society for the Teaching of Psychology has an office dedicated to great, peer-reviewed resources for teaching called the Office of Teaching Resources in Psychology.

Two such (free) resources for those of us teaching applied statistics include the free on-line book, Teaching Statistics and Research Methods: Tips from TOPS.

Another such resource, is Statistical Literacy in Psychology: Resources, Activities, and Assessment Methods

The web site housing these two resources is filled with great ideas, all of which have been peer-reviewed. You can find teaching resources including example syllabi as well as article on how to maximize your students’ learning. Even if you are teaching applied statistics in an area outside of psychology, I encourage you to make use of this value set of tools. ( )

Happy Teaching!




Leave a comment

Filed under Applied Statistics, Curriculum, Engaging students, Pedagogy, Preparing to Teach, Professional Development

Assuring Data Integrity

The quality of our data driven decisions is limited by the quality of our data. It really is that simple. If your data have errors, your decisions will be error prone as well.  This is the 4th in a series of posts on what needs to be taught to administrators making data driven decisions. Sure, administrators often hire people to handle data entry and analysis, but how can you tell if you have the right staffing, especially during times of financial limitations?   Before making decisions with data,  Administrators have to know how to look for signs of errors in their data. Even if an administrator has been formally trained in statistics or assessment,  an examination of many popular books in these areas reveals that not  many people are talking about the importance of verifying data integrity, that is the accuracy of your data.

Obviously, one of the best ways to assure data integrity is to follow appropriate data management plans. This file from MIT on Data Management provides a great deal of useful information

There are some professionals of data management who believe that it’s OK for there to be data errors and consider this among best practices … “Don’t waste money assuring all data are accurate.” Now, if you are working with an organization gathering data from tens of thousands of people, and there is no reason to believe that the errors present in the data will be systematic (that is always wrong in the same way), then I can see why such people are making such statements. The statistics will treat this small and random error like sampling error, thus minimizing the issue of error in your data while getting you to the information you really need to know to make the right decisions. However, most universities and smaller companies are working with data sets far smaller than what would be necessary to just let the errors be and let the statistics take care of it. Thus even very few data errors can truly mask what is going on. Moreover, I have found that most data errors in higher education aren’t random, making any error a problem.  Take for example data on enrollment. I have examined a lot of data on enrollment and anytime there is an error it is always in the same direction, underestimating the number of students enrolled in a particular program or who are part of a particular ethnic/racial group. When you are already working with smaller data sets, even a single error with a single observation could be enough to keep an administrator from making the best possible decision. And if the error is systematic, as most of the error I have seen in data sets have been, then you are assured to make a faulty decision with faulty data. Thus, identifying and fixing the errors in your data are an important part of any administrator interested in making high quality data driven decisions.

When training administrators, it often helps if you can use errors in data integrity with their own organization. Listed below are the most common types of data integrity errors that I have observed.

Types of Error:

  1.  Miscoding of Individuals leaves them out of important counts.
    • Example: Enrollment or demographic counts can be miscoded. I recall getting a set of raw data and seeing that for some students they were listed as being in the College Business, other students CoB, still others CBM, yet each of these students were in the College of Business Management. And though a human can look at each of these ways of listing College of Business and see they are the same, a computer can’t unless programmed to do so. However, it would be far easier to put into your data management plan that the College of Business Management is always coded as BUS. Then accurate counts become possible.
  2. Misplaced decimals or Place Value Errors.
    • Example: Proportion of a whole can create challenges. Decimals always seem to add to challenges in data accuracy. At most universities we base payment of a position off of a “whole” position. Thus if the full time position is 12 credits, someone teaching 6 credits would have a .5 position. Of course, sometimes positions become more complicated. For example, during an analysis a .05 position was treated as a .50 position, thus this position was increased by 10 times what she was supposed to be because of a decimal error. In another example while translating counts from one table to another, a person dropped a zero for the number. The table said 12, but the actual number of majors was 120.
  3. Inverse in Coding. Results that have to be formatted that are formatted in reverse.
    • Example: Let’s say we have a  survey with the responses Strongly Agree, Agee, Disagree, or Strongly Disagree, where 4 is Strongly Agree and  1 is Strongly Disagree. I have seen people code them with 1 at Strongly Disagree and 4 as Strongly Agree. Of course, it would be fine if such coding was done consistently and noted, but that’s not what I typically have experienced.
  4. Treating a True Score of Zero as a Non-Response.
    • Example: When I code data, I typically code females at zero and males as 1. A non response, which for this type of question actually holds some interesting information, does not get assigned a value. A graduate student once deleted all of the zeros in the table … every single one, and now values that should have been zeros were treated as if the person didn’t respond.
  5. Unexpected Errors that only someone new to data management or who is truly arrogant could create.
    • Example: They are, after all, unexpected and as such cannot be predicted or classified. Just know inexperience and arrogance or just arrogance are a bad mix in everything, including assuring data integrity. I’m not saying a more seasoned person isn’t apt to do something stupid. The difference is, given they are humbled enough through years of such embarrassing episodes, they not only know how to look for such errors, they triple check for such errors to avoid public humiliation!

It’s not simply enough to know about the errors, we have to know how to recognize them. Listed here are some examples of ways to share with administrators on how to assure the integrity of their data.

  • LOOK at the Data!
    • Always look at a chart or graph of the data you are about to make a decision upon. Does it make sense? It is what you expected? Is anything missing? Is anything extreme? Though it is true you won’t be able to find all errors this way, you will be able to spot quite a few. For example, if there are no females in your graduating class, then you know you have a problem, or if not a single student in 10 years graduated from the largest major on your campus in 4 years, you know you have a problem.
    • If at all possible, chart or graph multi-year trends as this will not only help you to see mistakes clearly, like the mathematics department going from three years of 100 plus majors, to a year of 12 majors, once you have determined there is no problem with the data, you can see trends more clearly.
  • Trust your Gut.
    • Even applied statisticians can’t walk around with all of the data easily accessible in their minds. However, we also tend to get a sense when something is off. Interestingly, research on infants and toddlers has demonstrated that from an early age, they are implicitly, that is not consciously picking up patterns in their observations. They are kind of like little statisticians. We never lose that implicitly, that is not available to consciousness, ability. However, as it is not consciously available, it often comes to us as a sense of … hmmm, that doesn’t feel right.
    • If you are looking at data and it doesn’t feel right, look at it more closely for errors. I actually had this occur when I was looking at means that seemed just a bit off … all of them. That was when I found someone had miscoded one of the responses. This coding error would have resulted in a decision that wasn’t supported by the data.
  • When critical, triangulate the data.
    • First, look at the raw data for discrepancies. If none are found, then triangulate the data.  The term triangulating of data literally means finding data from three different sources (e.g., the department chair, enrollment services, and institutional research) and compare them. If there is a difference you have to find where the source of the problem. Often you may be able to find two sources of data and not three. You can compare two sources of data as well, just make sure they are from independent  sources (e.g, enrollment services data vs. chair data).

Any administrator knows, it is far easier to not have errors than to have them, find them, and fix them. Here are some examples of how these problems can be minimized or at least detected early enough in the process that they can be fixed before creating challenges for decision makers.

  • Most of these problems will be minimized if a well thought out and articulated Data Management Plan is crafted and implemented.
  • When errors are found, examine to see if the problem is with the implementation of the Data Management Plan or with the Plan, itself.
  • Make revisions to the Data Management Plan as appropriate.
  • Within your data management plan, devise a period verification of the accuracy of the data. This should be twice a year, and truthfully shouldn’t take very long. Three data point checks for a half a dozen to a dozen different types of files should do the trick.
  • An extremely limited number of people should have access to raw data. When you start to code or otherwise prepare the data for analysis, that should NOT be taking place at the level of the raw data. It should be copied and then worked on in a separate file.
  • And please, high quality data analysis and data integrity requires skilled and appropriately paid staffing. Given that the consequences to having errors in your data is so high, this is not a place you want to scrimp.

Leave a comment

Filed under Applied Statistics, Methods of Data Collection, Professional Development

Cognitive Biases and Decision Making

The goal of this blog is to talk about how to best teach students how to successfully use applied statistics. During the last few weeks, and for the next few weeks, it is my plan to talk about a specific group of students: Administrators interested in making data driven decisions. During a prior blog, I reviewed the several components that should be part of such a training session. 

From experience, I have found that administrators are best able to learn about how to make data driven decisions only when they first learn about epistemology, how we know what we know, which was covered at this blog, and cognitive biases, innate tendencies that keep us from accurately assessing what is going on around us. Business Insider summarized 56 organizational cognitive biases here, Psychology Today reviews how cognitive biases negatively impact businesses, 

Though there are many cognitive biases, in an organizational setting I like to speak about these 6.

  1. Confirmation Bias – we seek out information that matches what we already believe to be true, ignoring all information that contradicts our beliefs.
  2. Ingroup Bias – though quite complex, in short, we tend to prefer people we deem to be part of our group. We view them as more varied as people outside of our group. When they make mistakes we tend to be more forgiven or understanding. We tend to exert more energy to help them and protect them from harm.
  3. Projection Bias – what we think and feel is what others are thinking and feeling.
  4. Gambler’s Fallacy – that the risk we are about to take is going to pay off, especially after a series of bad events, as our luck is bound to change.
  5. Status-Quo Bias – most people are simply more comfortable when things stay the same, even if they are less than ideal. Organizational change is not comfortable for most people.
  6. Bandwagon Effect – I have heard people use the phrase, “sheeple” people who are following the herd regardless of what information might be saying otherwise.

What each of these cognitive biases have in common is that we are placed into a cognitive state where we ignore the data right in front of our nose, particularly if it is contradicting our firmly held belief. I was once in an administrative meeting after a particularly challenging decision, one for which the faculty were strongly against. And, yet, an administrator remarked that 80% of the faculty were on board with this decision. I didn’t know of one person, let alone 80% who were supportive of this decision outside of the people in the room, but a couple of cognitive biases were taking hold. The top administrators all felt this was a good idea, so with the bandwagon effect (among other pressures), so did the middle level administrators. Then, since they believed it to be a great decision for which they all agreed, they projected their thinking onto the vast majority of the faculty. A quick survey (formal or informal) would have helped them to see what the faculty were actually thinking. That information could have been used to either change their decision, weaken the intensity of the decision, or provide communication/justification as to why such a widely disagreed upon decision had to be implemented.

Properly designed measures and appropriate sampling techniques can yield great data that can be used to help provide insight to administrators to aid them in moving an organization forward.

Certainly, if we stick with our cognitive biases, we’ll feel better about ourselves, but that won’t help an organization become the best it can be, as in the end, an administrator is only as good as the decisions he or she is making.

In training administrators on how to best use applied statistics, start with explaining how data can help them achieving higher quality decisions by by-passing epistemological and cognitive bias limitations.


Filed under Applied Statistics, Curriculum, Data Driven Decisions

Epistemology and Decision Making

In a recent post,, I outlined what information should be used in a series of training sessions for administrators to make data driven decisions. I have found that many people are resistant to the benefits of using data to make decisions even though they throw around phrases like, “data driven decisions,” or “business analytics,” or “big data,”  and as such, we have to first help them to understand how do we know what we know, and how does data fit into that set of knowledge?

Epistemology is the study of how we know what we know. Though there are many ways of classifying and characterizing epistemology types, I find that there are 4 different ways that we know:

  1. Authority – We know what we know because someone tells us, and we believe it to be true. For example, everything I know about celebrating an event I learned because my mother and grandmothers told me so.
  2. Intuition – Our gut tells us what is true. For example, my gut tells me my dog loves me.
  3. Empiricism – We know through observations. This is why we collect data, to learn from it through empiricism.
  4. Rationalism – The use of logic, both inductive and deductive, will help us know the truth. This classic example characterizes rationalism well. If a tree falls in the woods and no one is there to hear it, does it make a sound? We use rationalism to know that it does make a sound.

High quality data driven decisions are, primarily, a dance between empiricism and rationalism. We make observations (enrollment is down 20% in Sports Management over the last 5 years), create a prediction or explanation as to what we think is or will be going on (maybe students in these majors are not getting jobs), collect data to test the prediction (students are struggling to find jobs), and then based on the results (Make improvements Sports Management that will provide students with the skills needed for future success). And though as a scientist this may seem like a natural and obvious way to go about seeking information, this is not the standard protocol of seeking knowledge in many administrative ranks.

Decisions from intuition reign supreme in many administrative circles. And, it is true that during the past two years in the administrative role I made many gut level decisions. However, given all of our cognitive biases, which will be discussed in a future posting, this often leads us to less than optimal decisions. In a Harvard Business Review article on how good leaders make bad decisions, you can find examples of leaders who ignored data and relied upon their intuition to discern what was the optimal decision.

In administration, you can also see many sets of truths to be proclaimed simply because the person in authority told someone it was true. I was in a meeting, and asked a question about the justification for an expense. I was expecting some data to support the expense. Instead I was told, “The President says so.” That is authority, in its purest form. And yet, organizations are chock full of people who permitted authority to push a group decision in the wrong direction. Follow this link for information on the Space Shuttle Challenge disaster, where the organizational culture and its reliance upon authority driven “truths” cost the lives of 7 astronauts in January of 1986.

I have found that before I can help administrators learn about techniques in data analysis to help answer important organizational questions, I have to first get them to think about what they know, how they know it, and recognize that it is through creating a prediction or explanation, then collecting data to evaluate that predication or explanation, and most importantly, letting the data speak as to what is going on, that we can unveil our eyes from our cognitive biases, and get to the bits of information we truly seek that will lead us to great decisions.

Epistemology and Cognitive biases go hand in hand in helping keep us from truly using data accurately for organizational decisions. As a result, after sharing with administrators the ways of knowing, we must also outline standard cognitive biases that keep us from seeing the truth. Common cognitive biases facing an organization will be discussed in a future post.

If you are interested in learning more about epistemology, these sites have detailed information. or


Leave a comment

Filed under Applied Statistics, Curriculum, Data Driven Decisions

Data Driven Decisions … doing it right

Hello All,

I know there has been a delay of two years in the postings at Statistical Sage. For the past two years, I have been serving as an interim administrator. However, I am returning to faculty, straight back to a half a year sabbatical! I am happy to also be returning to posting here about teaching applied statistics.

As I was weighing the cost and benefits of leaving the classroom for two years, it never occurred to me that my time as an administrator would increase my resolve for the importance of the highest quality curriculum and teaching for applied statistics classes, but of all that occurred in the past two years, that is exactly what happened.

As you see, administrators make countless decisions each and every week. Some of those decisions are fairly minor in nature, while others have huge and lasting impacts on the university, college, faculty, students, and in the case of state sponsored universities … the tax payers. Often, though not nearly as often as I would like, data are at the center of those decisions, making it critical to have people in administration with at least some fundamental understanding of the use of statistics in decision making.

Now, before people start getting upset stating that we cannot quantify all that is important in a classroom and university setting, I openly admit that is true.

Though it is true that not everything can be quantified and formally assessed, much can be. I will review the most critical information that needs to be covered in an applied statistics class designed for training college administrators.

Over the next few weeks, I will be covering each of these topics in more detail as to how to best deliver this material.

  1. Epistemology, Decision Making, and Statistics
  2. Cognitive Biases; How Statistics can be used to get to the Truth
  3. Detecting Data Integrity Issues
  4. Data Management Protocol
  5. Populations vs. Samples
  6. Observational Errors: Measurement, Experimental, and Sampling
  7. Quality Decisions are Limited by the Quality of Measures
  8. Sampling and Quality Decisions
  9. Statistics and Sampling Error
  10. Parameters and Mathematical Modeling vs. Inferential Statistics (Introduction)
  11. Mathematical Modeling, Parameters, and Assumptions
  12. Statistical Decision Errors: Type I and Type II

Though these topics will be directly targeted toward how to teach a university administrator how to be a great data driven decision maker, this information is equally useful to anyone in any position to make data driven decisions, and foundational for any class in applied statistics, regardless of the audience.

In the end, quality decisions based on data are only as good as the integrity and the ability of the person making it. Though not every decision in academia should be a data driven decision, the quality of such decisions are limited by the quality of the data, which are limited by the quality of the measure and the quality of the sample. Such decisions are also extremely constrained by the appropriate use of the appropriate statistic. Over the next several weeks, I look forward to reviewing this information in more detail.

Leave a comment

Filed under Applied Statistics, Data Driven Decisions, Professional Development