Our love affair with big data

Posted by

Twenty years ago this summer, I was writing up my PhD thesis, about how to interpret and visualise big data from bridges, except we didn’t call it that back then. We just called it large data sets which really isn’t as sexy by half. Telling someone that you work in big data or should that be BIG DATA? To me, it feels like you are telling everyone how important it is Harry Enfield style: loadsadata like I’m the one with the big swinging data.

Funny because when data mining was all the rage, no one said it in the same way that they do BIG DATA. It sounded more like the ploughing that furrow approach that doing a PhD requires and we are all underground mining away with applications like Tesco’s clubcard, coming up with new ways to entice people to buy more things. Now, we are putting AI and BIG DATA together, well that has to be more meaningful, and more important right? We must be at least explaining the mysteries of the universe.

Or, are we?

Err no. We cannot discover brand new information using the information we have. We can make links and combine the information we capture, and eventually rethink what we are capturing with our new insights. By linking data together we see where we should monitor new things, rather like the time John Snow plotted outbreaks of cholera on a map and saw that they were near the same water pumps and then, a long time after that everyone realised that it wasn’t the fog killing people. But, it cannot tell you something brand spanking new, off the chain, never heard of before theories, if you are still matching your new data to old stories.

My PhD used models, i.e., different algorithms, which interpreted data in various ways to get the best fit to the data. Simply put, one of my users would be watching the measurements come in and then depending on what the data looked like would choose the most likely explanation. By changing various constraints the user could explore a small set of other but similar potential explanations. Nonetheless, we were mapping likely explanations onto the data before us. Unless the data was completely strange looking and then it was either ditched, or the odd outlier was deleted and then the set was interpreted to fit. Tailored, nipped and tucked.

And, that is how it works. I’ve said before that we are always trying to understand how the world works and our place in it. I’ve talked about how we reason and how we choose our stories. Nothing really changes.

It can be summed up perfectly by the time I was doing English Literature ‘A’ level. We were doing Beckett Waiting for Godot and I remember saying something that was just not what you say about Beckett, you have to say stuff about existentialist angst and the futility of it all and trot out the accepted interpretations of the play to pass the exam. Basically, you have to be a parrot. I think the teacher must have gotten frustrated and said that if someone says something different they are either a genius or an idiot and either way, it’s just unacceptable. I can’t help but think that is the way we run society. It takes ages to convince people of a new way of doing things unless someone gets rich quick and then everyone wants to know and have a go.

When I lectured AI I would start with the idea of readily transferable knowledge. I would point out the idea that it takes years to raise and educate a human until they are fully functional in the workplace, 16 years old at least and then years more to get the experience they need until they are wise and expert. Imagine if you could replicate that in an instant. Then, I would pause and the students would say: Ahhhhh.

Older and wiser now I am not so sure that instant knowledge is a good thing. It depends on who coded it up.

I listen to my girls in awe as they view the world so interestingly and refreshingly and I know that they are human and are able to reason universally unlike a machine. Sometimes they say the most inspiring things. Things I’ve forgotten I used to think naturally which bring me back to myself and how I used to be, before I was told how things are supposed to be.

They also talk sometimes a lot of nonsense as they don’t have the necessary information, or wisdom, or constraints. And, often when I explain that something can’t be done and they ask why not, alas, I am like my Eng Lit teacher saying: That’s just not a thing. It takes a few days before I will come back with a new answer of, yes, it could be a thing, we could do this. And, I can’t help but think that’s how we feel about big data.

We think that with enough data and enough tech we will discover something new and inspiring or fabulous and it will fulfil our yearnings and longings and bring us back to ourselves, the selves we already know, but have forgotten because we are conditioned by society.

I said to my husband that big data is our attempt to use technology to come back to ourselves and he said that he would definitely run that past the Machine Learning Department at work. But it’s true even there, all that searching is to find out a new USP or a competitive edge to make us stand out, to feel special, to win, but within reason as it has to fit within how things already work. And, here’s the rub: Any bit of AI has to have specific instructions to know what it is looking for in its big data. It won’t just randomly discover some brand new thing. Remember genius or idiot? It doesn’t have a universal reasoning system like my girls. It is just not possible in a computer. And, won’t be for a long time, if ever.

So perhaps big data is like having an affair.

Esther Perez’s book on infidelity says that most people don’t have an affair to have sex. They have an affair to rediscover themselves. They are seen through new eyes and appreciated once more in what feels like new and interesting ways. When we get that yearning for feeling that we matter, that we are important, that we mean something, we are sexy and alive, we often look for the answers outside ourselves and people who have affairs do so by looking to other people. Other people who look at us with love, like we are hot and the centre of the whole universe. Like the sun which everything revolves around.

However, unless someone is very self-unaware and even then, in the throws of a most exciting affair, most people wouldn’t discover something previously completely unheard of about themselves. What happens is that something the person takes for granted, or indeed may be taken for granted at home, may have a spotlight shone on it by their new love and then they appreciate themselves better and feel all loved up. They feel love. The love they’ve always had within them. The love within their love.

So, the question is: Is big data the spouse or your new love? Will you take it on home afterwards? Or, will you keep collecting data? And, most importantly, is it telling your whole story? The right story of you. The answer depends on who coded it. If it is a white middle class male, and/or a nerd, then unless you are one of those, you may be searching forever, ‘cos that is not where your story lies. Your story may not be in the model selection. It may not have even be collected in the data. You might not fit at all. You could have been deleted as an outlier.

My PhD data had to follow the laws of physics about how bridges stay up, so that is a big constraint. So there wasn’t a great deal of quantum physics mystical room for interpretation and lovin’ and collective unconsciousness. But, even without strict physical constraints, a lot of the time we hash out the same old stories because they fit in the same old world, same old society, same old ways of how we operate even though it is not based on the laws of physics. It’s based on economics.

Take web design, when we all first came online everyone was super creative because no one had any expectations of how a website should look. Nowadays, all sites look the same as they are designed by grid, yawn, ‘cos we are managing expectations and herding people through our site to do the things we want them to do, we are desperate not to lose them. Desperate for their money.

And then we have intuition. AI can’t tell if your data or indeed your interpretation is useful, only a human can intuit that, unless you get AI checking your AI but then you still need a human somewhere to tell all the AI what to do.

Like my blog, according to Jetpack I was having on average six people a day visiting and yet, I am often offered money to link to other sites, and so if I don’t get visitors why would people want to give me money to buy my links? Peculiar.

In the how to make a shed load of money online circles you have to build community and work really hard and share data of a specific size and shape and content. I didn’t do any of that, because I just want to write nice things which interest me. I am not selling anything, including my links. I guess this site is my big data, about me, and I do the interpreting and the love affairs and the linking to me, me, me.

Eventually, I deleted Jetpack because it was making my site grind really slowly and I couldn’t write my blogs without it timing out. So, following Jetpack’s lead as they themselves use Google analytics, I installed that instead, but I really couldn’t be bothered with the work it would take to configure it. Why make the UI so hard to understand and use? It assumes so much prior knowledge and not about how web stats should work, but about how it’s own system works without making that obvious in the design. So, a couple of months ago I went back to the raw data on my cpanel to really see. Guess what? As you can see from above, I get 1.2 million hits and nearly 300,000 visitors a year. I am just amazed and thrilled and I feel love. I figured no one cared what I had to say, because that is a story I’ve long believed and been told, and I didn’t bother writing a new story or even checking the data. Whoah!

Afterwards I wondered, who are these 300,000 people reading my site? My husband reckons nerdy people like him. And, then he added: Like you. I sat there for ages going I’m not a nerd. I am a woman, which honestly is a whole blog series right there. My husband said that I am sexist. He could be right. I’ll have to check the data first.

That said, if I have a lot of hits and not much community then that sounds right doesn’t? Nerds like reading not talking, and that’s what I do too. I rarely comment on someone’s site. I guess I am a nerd, but I don’t feel that I fit the nerd model. The one I was sold all those years ago. And, as much as we all like to resist labels that’s how we interpret the world – our social identities, how we categorise others, how we try to understand patterns. Sometimes, and this is sooo telling, when people can’t put me in a box, they will ask me what my husband does (rude!). And, moreover that is how AI interprets its big data. Oh yes! It cannot work any other way.

The best bit about looking at my data with fresh eyes was the key search terms, as Google when it did the personalisation thing (you know serving up what you’ve already seen or very similar pages based on what you’ve seen), also did away with search terms, but other browsers still collect it. Obviously Google hasn’t stopped collecting, it has just stopped sharing it, as my guess it wants to keep it for itself as information is power. But as The Stonekeeper says in Smallfoot: What are you going to do with that power?

Here are my top three search terms non-Googlified:

  1. Sex positions to lose weight
  2. Full human parts of a man in dialog
  3. Semiotics and storytelling

I hope these people found the things they were looking for, and I hope they read my take on semiotics and storytelling. However, say I was googling sex positions to lose weight, would I leave a comment? No! I mean I don’t leave a comment when I am googling recipes or anything. Imagine: hi, yes that reverse cowgirl position really did it for me. I lost four lbs in a week.

And, it is right up there with our top motivation of being seen and heard, often we are told by society that we are too fat, too slutty, too loud, too experienced, too inexperienced, too, too, too. So, again we must bend ourselves out of shape to get into the shape we think we will get loved for. We must fit the story, the model, the expectations.

So, who wouldn’t want big data to come along and tell us what is fabulous and the best thing about us? Alas, it cannot and will not do that until we learn to do that for ourselves. We need to light our own fires.

When I think of data, I love the story Sims creator Will Wright, told on Masterclass, I have lost all my notes and the classnotes so there are no links. Gah. Anyway, he said how he used all the data that someone collected in the 1960’s to understand and map out daily timetables for people’s Sims, as people had access to building whole worlds, but very few people wanted to do that, most people just wanted to replicate the world they live in. Imagine, you have the power to create worlds and you just want to make what you already know and see, you want to make a world in which you go to work and hate your job, just like your real world, you don’t want to make something new.

Technology is an extension of us, it reflects our concerns and motivations, and until we have new ones, and new stories, technology can only propagate what already exists, just on a much bigger scale. Is that what we want?

Personally speaking, I want technology to reflect all the love we can possibly feel and then some. At the very least, that’s where I would start.