Myth making in machine learning – Ruth Stalker-Firth

If you torture the data enough, it will confess to anything.
– Darrell Huff, How to Lie With Statistics (1954).

Depending on who you talk to: God is in the details or the Devil is in the details. When God is there, small details can lead to big rewards. When it’s the devil, there’s some catch which could lead to the job being more difficult than imagined.

For companies nowadays, the details is where it is at with their data scientists and machine learning departments, because it is a tantalising prospect for any business to take all the data that it stores and find something in those details which could create a new profit stream.

It also seems to be something of an urban myth – storytelling at its best – which many companies are happy to buy into as they invest millions into big data structures and machine learning. One person’s raw data is another person’s goldmine, or so the story goes. In the past whoever held the information held the power and whilst it seems we are making great advances technologically and otherwise, in truth, we are making very little progress. One example of this is Google’s censorship policy in China.

Before big data sets, we treasured artefacts and storytelling to record history and predict the future. However, it has for the most part focused on war and survival of the fittest in patriarchal power structures crushing those beneath them. Just take a look around any museum.

We are conditioned by society. We are amongst other things, gender socialised, and culture is created by nurture not nature. We don’t have raw experiences, we perceive our current experiences using our past history and we do the same thing with our raw data.

The irony is that the data is theoretically open to everyone, but it is, yet again, only a small subset of people who wield the power to tell us what it means. Are statisticians and data scientists the new cultural gatekeepers in the 21st century’s equivalent to the industrial revolution – our so called data driven revolution?

We are collecting data at an outstanding rate. However, call your linear regression what you will: long short-term memory, or whatever the latest buzz word within the buzz of the deep learning subset of neural nets (although AI the superset was so named in 1956) these techniques are statistically based and the algorithms already have the story that they are going to tell even if you train it from now until next Christmas. They are fitting new data to old stories and, they will make the data fit, so how can we find out anything new?

Algorithms throw out the outliers to make sense of the data they have. They are rarely looking to discover brand new patterns or story because unless it fits with what us humans already know and feel to be true it will be dismissed as rubbish, or called overfitting, i.e., it listened to the noise in the data which it should have thrown out. We have to trust the solutions before we use them but how can we if the solution came from a black box style application, and we don’t know how it arrived at that solution? Especially if it doesn’t resemble what we already know.

In storytelling we embrace the outliers – those mavericks make up the hero’s quest. But not in our data. In data we yearn for conformity.

There is much talk about deep learning, but it is not learning how we humans learn, it is just emulating human activities – not modelling consciousness – using statistics. We don’t know how consciousness works, or even what it is, so how can we model it? Each time we go back to the fundamental age old philosophical questions of what is it to be human and we only find this in stories, we can’t find it in the data, because ultimately, we don’t know what we are looking for.

It is worth remembering that behind each data point is a story in itself. However, there are so many stories that the data sets don’t include because it is not collected in the first place. Caroline Criado-Perez’s Invisible Women documents all the ways in which women are not represented in the data used to design our societal infrastructure – 50% of the data is missing and no one seems to care because that’s the way things have always been done. Women used to be possessions.

And, throughout history anyone with a different story to tell about how the world worked was not treated well, like Gallileo. And even if they did save their country but as people themselves, they didn’t fit with societal norms, they were not treated well either e.g., Joan of Arc, Alan Turing. And if they wanted to change the norm, they were neither listened to nor treated until society slowly realised that they were right and suppression is wrong: Rosa Parks, the Suffragettes, Gandhi, Nelson Mandela.

When it comes down to it, we are not good at new ideas, or new ways of thinking, and as technology is an extension of us, why would technology be any good at modelling new ideas? A human has chosen the training data, and coded the algorithm, and even if the algorithm did discover new and pertinent things, how could we recognise it as useful?

We know from history that masses of data can make new discoveries, both chemotherapy and dialysis were discovered when treating dying people during wars. There was nothing to lose, we just wanted to make people feel better, but the recovery rates were proof that something good was happening.

Nowadays we have access to so much data and we have so much technological power at our fingertips, but still, progress isn’t really happening at the rate it could be. And in terms of medical science, it’s just not that simple, life is uncertain and there are no guarantees which is what makes medicine so difficult. We can treat all people the same with all the latest treatments but it doesn’t mean that they will or won’t recover. We cannot predict their outcome. No one can. Statistics can only tell you what has happened in the past with the people on whom data has been collected.

But what is it we are after? In business it is the next big thing, the next new way to sell more stuff. Why is that? So we can make people feel better – usually the people doing the selling so that they can get rich. In health and social sciences we are looking for predictive models. And why is that? To make people feel better. To find new solutions.

We have a hankering for order and for a reduction in uncertainty and manage our age old fears. We don’t want to die. We don’t want to live with this level of uncertainty and chaos. We don’t want to live with this existential loneliness, we want it all to matter, to have some meaning, which brings me back to our needs which instead of quoting Maslow as I normally do (even though it is not a fair representation of women) I will just say instead that we just want to feel like we matter, and we want to feel better.

So perhaps we should start there in our search for deep learning. Instead of handing it over to a machine to nip and tuck the data into an unsatisfactory story we’ve heard before because it’s familiar and how things are done, why not start with a feeling? Feelings don’t tell stories, they change our state, let’s change it into a better state.

Perhaps stories are just data with a soul…
Brené Brown, The power of vulnerability

Which begs the question: What is a soul? How do we model that in a computer? And, why even bother?

How about we try and make everyone feel better instead? What data would we collect to that end? And what could we learn about ourselves in the process? Let’s stop telling the same old stories whilst collecting even more data to prove that they are true because I want you to trust me when I say that I have a very bad feeling about that.

Related Posts