Category Archives: Professional Development

Great Resource for the Teaching of Applied Statistics

Hello All,

The Society for the Teaching of Psychology has an office dedicated to great, peer-reviewed resources for teaching called the Office of Teaching Resources in Psychology.

Two such (free) resources for those of us teaching applied statistics include the free on-line book, Teaching Statistics and Research Methods: Tips from TOPS.

Another such resource, is Statistical Literacy in Psychology: Resources, Activities, and Assessment Methods

The web site housing these two resources is filled with great ideas, all of which have been peer-reviewed. You can find teaching resources including example syllabi as well as article on how to maximize your students’ learning. Even if you are teaching applied statistics in an area outside of psychology, I encourage you to make use of this value set of tools. ( )

Happy Teaching!



Leave a comment

Filed under Applied Statistics, Curriculum, Engaging students, Pedagogy, Preparing to Teach, Professional Development

Assuring Data Integrity

The quality of our data driven decisions is limited by the quality of our data. It really is that simple. If your data have errors, your decisions will be error prone as well.  This is the 4th in a series of posts on what needs to be taught to administrators making data driven decisions. Sure, administrators often hire people to handle data entry and analysis, but how can you tell if you have the right staffing, especially during times of financial limitations?   Before making decisions with data,  Administrators have to know how to look for signs of errors in their data. Even if an administrator has been formally trained in statistics or assessment,  an examination of many popular books in these areas reveals that not  many people are talking about the importance of verifying data integrity, that is the accuracy of your data.

Obviously, one of the best ways to assure data integrity is to follow appropriate data management plans. This file from MIT on Data Management provides a great deal of useful information

There are some professionals of data management who believe that it’s OK for there to be data errors and consider this among best practices … “Don’t waste money assuring all data are accurate.” Now, if you are working with an organization gathering data from tens of thousands of people, and there is no reason to believe that the errors present in the data will be systematic (that is always wrong in the same way), then I can see why such people are making such statements. The statistics will treat this small and random error like sampling error, thus minimizing the issue of error in your data while getting you to the information you really need to know to make the right decisions. However, most universities and smaller companies are working with data sets far smaller than what would be necessary to just let the errors be and let the statistics take care of it. Thus even very few data errors can truly mask what is going on. Moreover, I have found that most data errors in higher education aren’t random, making any error a problem.  Take for example data on enrollment. I have examined a lot of data on enrollment and anytime there is an error it is always in the same direction, underestimating the number of students enrolled in a particular program or who are part of a particular ethnic/racial group. When you are already working with smaller data sets, even a single error with a single observation could be enough to keep an administrator from making the best possible decision. And if the error is systematic, as most of the error I have seen in data sets have been, then you are assured to make a faulty decision with faulty data. Thus, identifying and fixing the errors in your data are an important part of any administrator interested in making high quality data driven decisions.

When training administrators, it often helps if you can use errors in data integrity with their own organization. Listed below are the most common types of data integrity errors that I have observed.

Types of Error:

  1.  Miscoding of Individuals leaves them out of important counts.
    • Example: Enrollment or demographic counts can be miscoded. I recall getting a set of raw data and seeing that for some students they were listed as being in the College Business, other students CoB, still others CBM, yet each of these students were in the College of Business Management. And though a human can look at each of these ways of listing College of Business and see they are the same, a computer can’t unless programmed to do so. However, it would be far easier to put into your data management plan that the College of Business Management is always coded as BUS. Then accurate counts become possible.
  2. Misplaced decimals or Place Value Errors.
    • Example: Proportion of a whole can create challenges. Decimals always seem to add to challenges in data accuracy. At most universities we base payment of a position off of a “whole” position. Thus if the full time position is 12 credits, someone teaching 6 credits would have a .5 position. Of course, sometimes positions become more complicated. For example, during an analysis a .05 position was treated as a .50 position, thus this position was increased by 10 times what she was supposed to be because of a decimal error. In another example while translating counts from one table to another, a person dropped a zero for the number. The table said 12, but the actual number of majors was 120.
  3. Inverse in Coding. Results that have to be formatted that are formatted in reverse.
    • Example: Let’s say we have a  survey with the responses Strongly Agree, Agee, Disagree, or Strongly Disagree, where 4 is Strongly Agree and  1 is Strongly Disagree. I have seen people code them with 1 at Strongly Disagree and 4 as Strongly Agree. Of course, it would be fine if such coding was done consistently and noted, but that’s not what I typically have experienced.
  4. Treating a True Score of Zero as a Non-Response.
    • Example: When I code data, I typically code females at zero and males as 1. A non response, which for this type of question actually holds some interesting information, does not get assigned a value. A graduate student once deleted all of the zeros in the table … every single one, and now values that should have been zeros were treated as if the person didn’t respond.
  5. Unexpected Errors that only someone new to data management or who is truly arrogant could create.
    • Example: They are, after all, unexpected and as such cannot be predicted or classified. Just know inexperience and arrogance or just arrogance are a bad mix in everything, including assuring data integrity. I’m not saying a more seasoned person isn’t apt to do something stupid. The difference is, given they are humbled enough through years of such embarrassing episodes, they not only know how to look for such errors, they triple check for such errors to avoid public humiliation!

It’s not simply enough to know about the errors, we have to know how to recognize them. Listed here are some examples of ways to share with administrators on how to assure the integrity of their data.

  • LOOK at the Data!
    • Always look at a chart or graph of the data you are about to make a decision upon. Does it make sense? It is what you expected? Is anything missing? Is anything extreme? Though it is true you won’t be able to find all errors this way, you will be able to spot quite a few. For example, if there are no females in your graduating class, then you know you have a problem, or if not a single student in 10 years graduated from the largest major on your campus in 4 years, you know you have a problem.
    • If at all possible, chart or graph multi-year trends as this will not only help you to see mistakes clearly, like the mathematics department going from three years of 100 plus majors, to a year of 12 majors, once you have determined there is no problem with the data, you can see trends more clearly.
  • Trust your Gut.
    • Even applied statisticians can’t walk around with all of the data easily accessible in their minds. However, we also tend to get a sense when something is off. Interestingly, research on infants and toddlers has demonstrated that from an early age, they are implicitly, that is not consciously picking up patterns in their observations. They are kind of like little statisticians. We never lose that implicitly, that is not available to consciousness, ability. However, as it is not consciously available, it often comes to us as a sense of … hmmm, that doesn’t feel right.
    • If you are looking at data and it doesn’t feel right, look at it more closely for errors. I actually had this occur when I was looking at means that seemed just a bit off … all of them. That was when I found someone had miscoded one of the responses. This coding error would have resulted in a decision that wasn’t supported by the data.
  • When critical, triangulate the data.
    • First, look at the raw data for discrepancies. If none are found, then triangulate the data.  The term triangulating of data literally means finding data from three different sources (e.g., the department chair, enrollment services, and institutional research) and compare them. If there is a difference you have to find where the source of the problem. Often you may be able to find two sources of data and not three. You can compare two sources of data as well, just make sure they are from independent  sources (e.g, enrollment services data vs. chair data).

Any administrator knows, it is far easier to not have errors than to have them, find them, and fix them. Here are some examples of how these problems can be minimized or at least detected early enough in the process that they can be fixed before creating challenges for decision makers.

  • Most of these problems will be minimized if a well thought out and articulated Data Management Plan is crafted and implemented.
  • When errors are found, examine to see if the problem is with the implementation of the Data Management Plan or with the Plan, itself.
  • Make revisions to the Data Management Plan as appropriate.
  • Within your data management plan, devise a period verification of the accuracy of the data. This should be twice a year, and truthfully shouldn’t take very long. Three data point checks for a half a dozen to a dozen different types of files should do the trick.
  • An extremely limited number of people should have access to raw data. When you start to code or otherwise prepare the data for analysis, that should NOT be taking place at the level of the raw data. It should be copied and then worked on in a separate file.
  • And please, high quality data analysis and data integrity requires skilled and appropriately paid staffing. Given that the consequences to having errors in your data is so high, this is not a place you want to scrimp.

Leave a comment

Filed under Applied Statistics, Methods of Data Collection, Professional Development

Data Driven Decisions … doing it right

Hello All,

I know there has been a delay of two years in the postings at Statistical Sage. For the past two years, I have been serving as an interim administrator. However, I am returning to faculty, straight back to a half a year sabbatical! I am happy to also be returning to posting here about teaching applied statistics.

As I was weighing the cost and benefits of leaving the classroom for two years, it never occurred to me that my time as an administrator would increase my resolve for the importance of the highest quality curriculum and teaching for applied statistics classes, but of all that occurred in the past two years, that is exactly what happened.

As you see, administrators make countless decisions each and every week. Some of those decisions are fairly minor in nature, while others have huge and lasting impacts on the university, college, faculty, students, and in the case of state sponsored universities … the tax payers. Often, though not nearly as often as I would like, data are at the center of those decisions, making it critical to have people in administration with at least some fundamental understanding of the use of statistics in decision making.

Now, before people start getting upset stating that we cannot quantify all that is important in a classroom and university setting, I openly admit that is true.

Though it is true that not everything can be quantified and formally assessed, much can be. I will review the most critical information that needs to be covered in an applied statistics class designed for training college administrators.

Over the next few weeks, I will be covering each of these topics in more detail as to how to best deliver this material.

  1. Epistemology, Decision Making, and Statistics
  2. Cognitive Biases; How Statistics can be used to get to the Truth
  3. Detecting Data Integrity Issues
  4. Data Management Protocol
  5. Populations vs. Samples
  6. Observational Errors: Measurement, Experimental, and Sampling
  7. Quality Decisions are Limited by the Quality of Measures
  8. Sampling and Quality Decisions
  9. Statistics and Sampling Error
  10. Parameters and Mathematical Modeling vs. Inferential Statistics (Introduction)
  11. Mathematical Modeling, Parameters, and Assumptions
  12. Statistical Decision Errors: Type I and Type II

Though these topics will be directly targeted toward how to teach a university administrator how to be a great data driven decision maker, this information is equally useful to anyone in any position to make data driven decisions, and foundational for any class in applied statistics, regardless of the audience.

In the end, quality decisions based on data are only as good as the integrity and the ability of the person making it. Though not every decision in academia should be a data driven decision, the quality of such decisions are limited by the quality of the data, which are limited by the quality of the measure and the quality of the sample. Such decisions are also extremely constrained by the appropriate use of the appropriate statistic. Over the next several weeks, I look forward to reviewing this information in more detail.

Leave a comment

Filed under Applied Statistics, Data Driven Decisions, Professional Development