Naked Statistics: Stripping the Dread from the Data

  Charles Wheelan

April 8th 2020


Big data and data has become such a buzzword that we often forget data is useless without the statistical method to make meaning of it. It is truly fascinating how two parts of the same process have received such different attractions. Obviously statistical ideas and different methodologies are complex and don’t make a good headline but the extent to which it has been mainly put aside is perplexing.

"Data is merely the raw material of knowledge. Statistics is the most powerful tool we have for using information to some meaningful end. Statistical analysis is the detective work that crafts the raw data into some meaningful conclusion "

More importantly, given how much of our lives are constantly influenced by data and the likely impact on our future, it would make sense for schools to include Statistics as a core subject. I cannot speak for other countries but unfortunately in Bhutan, there is no trace of statistics in middle and high school Math curriculum.

Although I am no expert or much knowledgeable in statistics, I was mildly surprised by how basic and introductory this book seemed. Reading this book made me realize how thankful I am to my college statistics professor, Michael Kahn. I took Accelerated Statistics and Methods of Data Analysis which I have not only enjoyed but has changed what I want to study in graduate school. This book is perfect for those having a basic idea of statistics or for those that took introductory statistics and desire to build a strong base for statistical knowledge.

"Statistics is a high-caliber weapon: helpful when used correctly and potentially disastrous in the wrong hands.This book will not make you a statistical expert; it will teach you enough care and respect for the field that you don't fo the statistical equivalent of blowing someone’s head off "

One of the most recurring ideas in the book is that Statistics is just a tool that can be used both for helpful as well as nefarious purposes. Sometimes, people don’t knowingly use statistics to cause harm but rather because of the lack of proper understanding of the idea. In the book called 'Weapons of Math Destruction' the author writes about how New York city tried to use data and artificial intelligence to deter crime. Basically the idea was to statistically predict and locate places where there will be crimes. However, because of the way the system was designed (not fully being aware of the underlying assumptions), it targeted minority communities and with more time it exacerbated the effect causing more harm. On the other side statistical methods have improved lives. For instance, in the book 'The Undoing Project', researchers found that softwares designed using statistical methods was able to detect cancers from an x-ray picture more accurately than doctors.

With the advent of computers and numerous statistical softwares, it has become easy for anyone to perform a regression analysis. However, it is crucial that people know the underlying assumptions that need to be fulfilled before churning out a regression equation and implementing policies centered around it. And this is where I believe a lot of us fail.

"Regression analysis is the hydrogen bomb of the statistics arsenal. Every person with a personal computer and a large data set can be a researcher in his or her own home or cubicle. What could possibly go wrong? All kinds of things. Regression analysis provides precise answers to complicated questions. These answers may or may not be accurate. In the wrong hands, regression analysis will yield results that are misleading or just plain wrong."

The government of Bhutan recently started an initiative called Economic Road Map for the 21st Century which will plan out the economic road map of Bhutan for the decade. On their website, and social media accounts, they have a poll to determine the Bhutanese views on our economy. This is a good initiative at least in terms of collecting data since it is rare in Bhutan. However, I hope that the government doesn’t let these numbers in any way influence their policies because these numbers are inherently flawed.

The method of data collection.

  1. The Internet is relatively new in Bhutan compared to other countries. Additionally, the majority of the population is not educated enough to navigate a website to cast their vote on the well-being of the Bhutanese economy. Therefore, the people who do vote in no way represent the whole population. The sample is vastly different from the actual population. You may say they are Bhutanese at the end of the day and should resemble the general population. The voters are the ones who have access to the internet and are educated enough to not only navigate websites and social media apps but understand what the word economy means since there is no dzongkha version of the poll.

  2. When I first became aware of the website, I was excited and wanted to see the robustness of the website. So I tried to vote multiple times with different answers. Surprisingly, it accepted every vote. I am not entirely certain but from what I could see each vote was being counted every time.

The issue with this data collection is that in the end, people carrying out this poll will most likely conclude that the x percent of the Bhutanese population feels a certain way (doing well, not well or can’t say) about the economy. This is not true because the sample is not random enough to represent the whole Bhutanese population.According to the way the data is being collected, the correct conclusion would be that x percent of Bhutanese having internet access and educated to know what economy means, feel the Bhutanese economy is doing well or not or can’t say.



Michael Kahn: one of the best Professor any student can have.