Daily KOS poll data was fraudulent

This does not sound like a topic for a lean six sigma blog.  But it is not the subject, but how it was recognized that we may want to learn from.

I read this in politico.com “KOS: Poll Data was Fraudulent” by David Cananese on 6/29

This organization found that there is a high probability that their hired polling agency was faking the data.  It was recognized through statistical methods we use in our projects.

1. There was not enough variation in the numbers (read it as a control chart that had all the data in zone A)

2. There were too many repeating even and odd numbers.  We would see that on a probability plot with granularity.

in other words, the poll data was not random.

I find it is very difficult to make up a data set that is to appear random, say it is for a class problem worksheet, without using a software to generate the numbers.  Try it yourself.  Make a set of 25 numbers that test positive for a normal distribution.  you will be surprised how hard it is.

