The Importance of Questions in Data Analysis

“The sexy job in the next ten years will be statisticians” said Google’s Chief Economist back in 2009.  The ability to understand data and pull out valuable insight will become increasingly in demand in business, government, and journalism to name but a few fields.

And one of the most important first steps when analysing data is the questions you ask.

Let’s take journalism as an example.  In years gone by, a researcher would surround himself with the national and regional papers and scour them for hours, searching for one line – a line that begged more questions to be asked.  He’d return to the newsroom from this activity present the line to a journalist, and say “follow this up.”

A brainstorming session would follow.  But these were different from what you and I might think of as a brainstorming session.  It wasn’t sitting around staring at a blank flipchart or whiteboard.  Instead they arrived with the idea.  The goal of the brainstorming session was for numerous people to fire as many questions as they could think of in 10 minutes.  Any more than 10 minutes, and they had probably started naval-gazing.  This was a quick hit to explore as many angles as they could think of.  Then filter them down to the juiciest ones, guided by every journalist’s greatest asset – a nose for a story.

Let’s take a current example.  The European Court of Justice ruled in March 2011 that “taking the gender of the insured individual into account as a risk factor in insurance contracts constitutes discrimination”.  The ruling will come into effect in December 2012.  Insurers have had the time in between to adjust their pricing models.

This ruling poses a number of questions for insurers and the general public.  It also challenges the application of statistics in this context.

Faced with this news, what questions can your class generate?  What if they take on different roles?  What questions might the insurer ask?  How might they adjust their policies to account for the new ruling?  What about a journalist?  What questions might they ask in the public’s interest? What about the perspective of the judge in the ruling?  What might the opposing lawyers have argued?  If you take this ruling further, what implications could there be?  What challenges could you make to the ruling?

The average premium for women in the UK is £425 pa compared with £536 pa for men.  However, what challenges can be made to the use of averages in this instance?  Given the opportunity, how would you dive into the data to gain greater insight?

Some commentary and coverage around this ruling could add additional contextual information and offers the jump-off point for further questions and discussion:

“Currently millions of insurance policies take gender into account. The court ruled that practice as inappropriate since there are myriad other factors that could be considered. Gender, however, is typically easy to check and can point to sound statistical conclusions, the industry says.”  NY Times

Speaking of the case’s advocate general, Julianne Kott, the Wall Street Journal writes:

“Life-insurance discrimination might be permissible under the law, she allows, if women live longer because they are women, if there is something innate and biological about the female sex that causes longevity.”

But, she argues, important causes of longevity are behavioral—eating habits, smoking and drinking, sports, work environments, drug use. That women have, on average, behaved differently than men doesn’t necessarily mean any one woman’s femaleness is the reason why.

Differences in longevity “merely come to light statistically,” Ms. Kokott writes, and sex is thus just shorthand for whatever is causing those differences. And, she says, “the use of a person’s sex as a kind of substitute criterion for other distinguishing features is incompatible with the equal treatment of men and women.””

One suggestion is that insurers might encourage more people to sign up for black box insurance.

Black box insurance – also known as ‘telematics’ or ‘pay as you go car insurance’- aims to offer drivers a cheaper alternative by delivering driver-centred premiums based upon actual driving style rather than statistics.”

Similar in concept to the black boxes in aeroplanes (though presumably not indestructible), these devices track when and where you are driving and measures your speed, acceleration and braking.  Instead of using statistics based on your demographics, it would give a more direct impression of how safe a driver you are.  However, this doesn’t remove statistics completely.  The roads you drive on and the time of night you drive impact how much you have to pay, which presumably is based on the probability of having an accident.

The Guardian discusses other ways insurers might respond to the new ruling:

““It has been suggested some insurers may try to get round the rules by re-classifying the cars typically bought by young men into a higher insurance category, which would in turn push their premiums up. The ABI research paper mentioned an unnamed insurer which said women accounted for 70% of its Mini drivers, but only 30% of its BMW drivers. Alternatively, car insurers may start paying more attention to people’s occupations.”

One suggestion is that insurers might encourage more people to sign up for black box insurance:

Black box insurance – also known as ‘telematics’ or ‘pay as you go car insurance’- aims to offer drivers a cheaper alternative by delivering driver-centred premiums based upon actual driving style rather than statistics.”

As an example exercise, you could divide students into small discussion groups, and assign them roles (e.g. one group could be journalists for national press, another could be journalists of an insurance industry publication, and a third group could be senior managers of an insurance company).

Give them each ten minutes to brainstorm within the group as many different questions as they can.  Then get them to filter down the questions to the most important, and discuss how they might go about answering these questions, and the potential implications of their findings.  Then get each group to report back to the larger group, and invite further questions from the class.

You could then review the whole session and how asking more questions early on has an impact in how you approach statistical analysis, and other contexts in which you could apply this approach.

All of this should hopefully stimulate engaging and lively debate based on a real-world example of applied statistics.

Leave a comment

Filed under Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s