Losing Faith in Statistics

So I might as well couch this in a personal context. I've always been skeptical of the use of statistics and mathematical induction, but over the past year or so, my skepticism has turning into a constant eye-rolling for the predictive use of statistics in real any context.

The personal side is me starting my PhD. You have this realization that there's a tendency to sugarcoat everything "scientific" in front of a popular audience, but once you're a part of a PhD program, everyone expects you to be a true believer, so they're pretty honest (if they're savvy of the reality themselves) about how piecemeal and thin statistical knowledge is. Granted I'm a generative linguist, so I've always questioned the usefulness of statistical analyses of linguistic corpora, etc. I've also taken every study I've heard or read about with a grain of salt. But once you get into the belly of the beast and realize you still have to swallow a lot of salt, you start wondering if there's anything other than all of the onion peels. Usually there is, and it's all just baseless assumptions.

If you're a commoner, you put a lot of faith in studies which are "genome-wide statistical analyses" or "well-replicated" or even something basic like "published in a peer review journal." But when you tear each of these apart, you realize the tenuousness of all of them, as you realize the quasi-pseudo-scientific nature of the actual academic process.

Bad statistics ends up coming worst of all because statistics, in many ways, is a glitch in human suggestability. Statistical tools sound maximally smart and maximally impressive, while poor use of them can be so easy considering the wide range of things you can run superficial statistics over. In most cases, statistics ends up being a kind of slight-of-hand to "deal" with the unknown in a way that looks comprehensive, while it's really a cobweb of defense against complex machinery we know nothing about, whether this is the machinery of the brain, society, the economic, whatever. Predictive statistics work only to deal with little unknowns whose archetecture is easily ignorable.

With all of this in my own life, there's similar statistical upheaval in the socio-political world. A couple years ago we could talk about in Taleb's pithy two-word phrase: the "Black Swan." Now we can talk about it in one, more concrete word: Trump. I used to have a great deal of faith in the viability of statistics to predict some sociological events within some range: the realm of politics seemed ideal for this because an election is a highly formalized and overt machine with pseudo-predictable processes in nomination and campaigns, etc. Within the past couple elections, we've seen the rise of statistics """wonks""" like Nate Silver who rely on smart-looking formulas and past-events and predict events with only some error. In politics-as-usual, like last election, Silver seems to have a fair degree of accuracy, but then again who else could get at least 49 of the 50 states' election decision right? Pretty much anyone heuristically (I got all 50 right).

We've also seen the rise of prediction markets, last election, there was Intrade, which famously predicted about all of the 2010 congressional seats. Prediction markets pretty much invariably seem to get it right, but this is a sort of cheating. Markets will eventually converge on the right answer, after many ups and downs, because all that has to happen is the event actually happen before the market closes. Prediction markets, in a lot of situations, spend most of their time being wrong, wronger than most off-the-cuff guesses. The only advantage of a market is that it requires you to put your money where your mouth is.

But then there's Trump. A genuine black swan who has shat on all conventional wisdom. Silver, back last summer, noticed Trump's extreme popularity and gave him, based on past events, about a 2% chance of winning the nomination. Just a week or so again, he revised his prediction to 50% (Rubio being at about 30-40%). Silver, and all other statisticians have to ignore Trump's actual persuasive abilities and his rhetorical strength and have to focus on him as if he's merely another right-wing populist and outsider candidate, so we can compare him to precursors so we can pigeonhole him into the statistics.

Any kind of Bayesian algorithm can only see part of one step ahead of where we are now with the assumption that pretty much everything important will always be the same. This kind of method is naturally and unavoidably blind to genuine novelty.

The only people who succeeded at seeing Trump's rise were those who focused on his potentiality, not his relationship to past events. Scott Adams, of Dilbert fame, gets all the credit here. Adams noticed Trump's immunity from media blackmail, his stable and decided fanbase and most importantly, his impressive persuasive and antifragile rhetorical abilities. This is a recipe for victory, and a recipe that makes Trump's cooking a whole lot different from the other "outsiders'" than have preceded him in running (Forbes, Perot, etc.). Quantitative analysis will never explain Trump's persuasive abilities, but anyone could've looked at the situation and seen the inevitable. Adams predicts a Trump victory in the nomination and a landslide against Clinton. I'm leaning in that direction, but I do see a particularly Machiavellian Republican National Convention that might succeed in undermining his nomination in favor of Rubio. Barring that, given Trump's persuasive abilities, the presidency seems his for the losing.

One of the take-aways from Black Swan phenomena is that we can predict unimportant events that are part of the predictable patterns of life, but we're powerless to predict precisely everything that matters. In fact, if we could predict and systemitize Black Swans, they wouldn't be particularly important. Positive and predictive statistics should be eternally relegated to the realm of normalcy.

Statistics still sound smart to the masses, but when we use them, we're not gaining access to some privileged realm of deeper understanding. We're using number sorcery to give us guesses to what we don't know. Statistics aren't even a tool for dispelling ignorance, only providing some pitiful guide as we wonder through it.