In the 1970s, Marvin Minsky, father of frames, and some say neural nets, told a press conference that 50 years on, computers would read and understand Shakespeare.
Today, computers can indeed read Shakespeare but understanding, not really, not so much, even though they have looked at Shakespeare in a few ways:
- Computers are proving which bits Shakespeare didn’t write, apparently John Fletcher wrote some parts of Henry VIII. I’ve always loved this conversation about who wrote what, especially the Christopher Marlowe and Shakespeare conspiracy theories. Was Marlowe really Shakespeare? Etc.
- Machine learning can categorise whether a Shakespeare play is comedy or tragedy based on the structure of how the characters interact. In a comedy simply put, characters come together a lot. In a tragedy, they don’t – and ain’t that the truth in real life?
- Anyone can generate their own Shakespearean play with machine learning.
No. 3 seems mind blowing, but to be honest, and I love me some Shakespeare, the results truly makes no sense. However, it is hard to see that at first, because, Shakespearean English is like another language. I have attended some brilliant performances from Shakespeare School over the last couple of years, watching my children on stage, but for the first time, I realised, it is only the context and the acting which, for me, gave the words their meaning, rather like when you watch a film on TV in a language you don’t quite understand, but often the story is universal. It has emotional resonance.
I learnt Macbeth’s first soliloquy in sixth form: Is this a dagger which I see before me? It is when Macbeth contemplates his wife’s horrifying idea of killing Duncan the king. I can still recite it. It is meaningful because I studied it in depth and ruminated on what Macbeth must have been feeling, filled with ambition, and excited but horrified, whilst feeling the this isn’t going to end well feels.
However, machine learning cannot understand what Macbeth is saying, it hasn’t semantically soaked up the words and felt the emotional horror of contemplating murder in the name of ambition. All it has done is read the words and categorised them, and then written more words using probability to infer statistically what is the most likely next word as it was constructing each sentence, rather like predictive text does. It’s good and works to a certain extent, but none of us think that our predictive text is thinking and understanding. It feels like almost guessing.
We can see this more easily when looking at Harry Potter. The text is much simpler than Shakespeare so when a computer reads all the books and writes a new one, which is what the cool people at Botnik did, it’s easier to see that the novel Harry Potter and the Portrait of what Looked Like a Large Pile of Ash is interesting for sure, but doesn’t make a great deal of sense.
“Leathery sheets of rain lashed at Harry’s ghost as he walked across the grounds towards the castle. Ron was standing there and doing a kind of frenzied tap dance. He saw Harry and immediately began to eat Hermione’s family.”
“Harry tore his eyes from his head and threw them into the forest.”
Very dramatic – I love the leathery sheets of rain – but it doesn’t mean anything, well it does in a way, but it hasn’t been designed in the way a human would design a story, even unknowingly, and it doesn’t have the semantic layers which give text meaning. We need to encode each piece of data and link it to other pieces of data in order to enrich it and make it more meaningful. However, to make this a standard is very difficult, but the WWW consortium is working very hard on semantics in order to make data more useful, to create a web of data, especially when all our devices go online, not that I think it is a good idea, my boiler does not need to be online.
And this, my friends, is where we are with machine learning. The singularity, the moment when computers surpass human intelligence, is not coming anytime soon, I promise you. Currently, it is a big jumble of machines, data sets, and mathematics. We have lots of data but very little insight, and very little wisdom. And, that is what we are looking for. We are looking to light the fire, we are looking for wisdom.
The prospect of thinking machines has excited me since I first began studying artificial intelligence, or in my case l’ intelligence artificielle and heard that a guy from Stanford, one Doug Lenat, wrote a LISP program and had it discovering mathematical things. It started simply with 1+1 as a rule and went on to discover Goldbach’s conjecture, which asserts that every even counting number greater than two is equal to the sum of two prime numbers.
The way the story was told to me, was that Lenat would come in every morning and see what the computer had been learning over night. I was captivated. So, imagine my excitement the day I was in the EPFL main library researching my own PhD and I stumbled across Lenat’s thesis in the library. I read the whole thing on microfiche there and then. Enthralled I rushed back to the lab to look him up on the WWW – imagine that, I had to wait until I got to a computer – to see that after his PhD, he had gone off to create a universal reasoning machine: Cyc.
Lenat recently wrapped up the Cyc project after 35 years. It is an amazing accomplishment. It contain thousands of heuristics or rules of thumb which us humans have already learnt by three years old, and which computers need to have in order to emulate reason. This is because computers must reason in a closed-world, which means that if a fact or idea is not modelled explicitly in a computer, it doesn’t exist. There is so much knowledge we take for granted even before we begin to reason.
Interestingly, enough when asked about it, Marvin Minsky said that Cyc had had promise but had ultimately failed. Minsky said that we should be stereotyping problems and getting computers to recognise the stereotype or basically the generic pattern of a problem in order to apply a stereotypical solution. I disagree, archetypes potentially, maybe, with some instantiation, stereotypes no.
In this talk about Cyc, Lenat outlines how it uses both inductive (learns from data) and deductive (has heuristics or rules) learning. Lenat presents some interesting projects, especially problems where data is hard to find. Imagine that? Data is hard to find!
Someone said to me the other day that a neuroscientist told them that we have all the data we will ever need. I have thought about this and hope the neuroscientist meant: We have so much data we could never process it all because to say we have all the data we need is just wrong. A lot of the data we produce is biased, inaccurate and useless. So, why are we keeping it and still using it? Just read Invisible Women to see what I am talking about. Bin that data! Moreover as Lenat says, there are many difficult problems which don’t have good data with which to reason.
Cyc uses a universal approach to reasoning which is what we need robots to do in order to make them seem human which is what the Vicarious project funded by amongst others, Elon Musk (who on the cover of Wired magazine reminded me of Sheldon Cooper). I looked at the website which said that Vicarious is trying to discover the friction of intelligence, without using massive data sets, which is understandable because as I have said before, what we are really looking to do is how to encapsulate human experience, which is difficult to measure let alone to encapsulate because to each person, experience is different, and a lot goes on in our subconscious.
Usually, artificial intelligence learning methods take opposite approaches either the deductive rule-based, if x then do y, using lots of heuristics (like Cyc has programmed up) or an inductive approach, the look at something long enough and then find the pattern in it, a sort of, I’ve seen this 100 times now that if x, y follows and that is how machine learning (ML) works.
ML uses an empirical approach of induction. After all, that is how we learn as humans, we look for patterns – we look in the stars and the sky for astrology and astronomy, we look at patterns in nature when we are designing things and patterns in our towns, especially people’s behaviour especially online nowadays on social media.
Broadly speaking, ML takes lots of data, looks each data point and either decides yes or no, rather like the little nand and nor gates in a computer, and in fact replicates what the neurons in our brains do too. And, this is how we make sense in stories: day/night, good/bad as we are looking for transformation. Poor to rich is a success story, rich to poor is a tragedy. Neuroscience has proven that technology really is an extension of us which is so satisfying because it is, ultimately, logical.
In my last blog, I looked at how to get up and running as a data scientist using python and pulling data from Twitter, and in another blog, another time, I may look in detail at the various ML methods, under the two main categories of supervised and unsupervised, as well as deep learning, which uses rewards or reinforcement, that is a human steps in to,say yes this categorisation is correct or no, it is not, because ultimately, a computer cannot do it alone.
I don’t believe a computer can find something brand spanking new, off the chain, never discovered, seen or heard of before, without a human-being helping which is why I believe in human-computer interaction. I have said it so many times in the human-computer interaction series, in our love affair with big data, and all over this blog but honestly, I could be wrong. Machine learning is such an exciting field which is what I thought over 20 years ago and lucky me, I still think that today.