The hype is well underway, and apparently so is the funding and we are looking at a ChatGPT boom. Over the next while, it will be enough to say: and it will use ChatGPT to get funding for your pet project. There’s already a plethora of books on Amazon about: How to make money with ChatGPT.
This phenomena of people touting a technology to solve all our problems and get rich isn’t new. It happens over and over. The first time I saw this was at the AI in Design conference in 1996 at Stanford University. The answer to any problem posed during any talk seemed to be: Put it on the Internet. Naive and optimistic, I thought that life and problem-solving should be more nuanced, which was why we were applying AI to design in the first place and opening it up to more input would only add complexity and confusion.
Older and perhaps wiser, or not, I now think: Put it on the internet, is not a bad way to go. We are often augmented by other people’s input as people sometimes have the answer to the problem we want to solve and a quick Google shows us how to find them. Even without other people, sometimes a different point of view looking at data in a different way, can get us a new solution. It is what serendipitous design and AI search spaces and creativity design are built on.
And now, we have ChatGPT which is completely made up of everyone else’s input, albeit constructing text in a non-human way. It is the epitome of the wisdom of the crowd, that is to say, the collective opinion of everyone rather than that of one expert which is how many websites: Wikipedia, Reddit, and Quora are built. The BBC uses the whole web as a content management system using the method of cooperation without coordination.
And, ChatGPT has read it all.
What is ChatGPT?
ChatGPT is a large language model, using a giant neural net, developed by OpenAI, which generates text that is very similar to how humans write by analysing webpages and online digitised books. What it writes is not always factually correct but it does read like a human has written it. The developers call the factually incorrect bits hallucinations, and admit it is socially biased and hostile (like Boris Johnson’s all night ‘meetings’ full of booze and people during lockdown).
It is the first of its kind, as no other type of AI has been able to do this before mainly because of the limits of computational power and access to that much training data.
Broadly speaking there are two ways of doing AI: qualitatively (logic) versus quantitively (statistics) and ChatGPT falls into the second way:
1) Logic: Knowledge-based systems (qualititative reasoning) using rules and logic, which have been painstakenly programmed to represent say a doctor or surgeon’s knowledge and experience which comes from a lifetime of learning. It works in a closed-world, that is to say, if it is not in the system, it doesn’t exist. Rather like those annoying people we all know who only believe it when they see it.
2) Statistics: Machine learning (quantitative pattern matching) in particular, neural networks are based on a proposed biological model of the brain. They strengthen the synapses (links) between neurons (nodes) by adding a weight each time they see a similar link between two points in a data set, and it has many layers which data sets flow through, up to about 150 normally, to keep finding links. This observation approach is rather how our ancestors watched the alignment of the stars and the turning of the seasons over and over until they found a recurring pattern. The problem with this approach is that if we stare long enough we will find patterns everywhere like finding Jesus in your toast or faces in your popcorn.
Neither AI approach is very good at writing text because of the natural ambiguity of language. This is because, humans learn how to speak using the rules of syntax and grammar that turn a sentence into sense, and semantic layers which give text meaning as humans interpret the structure and words with cultural references, storytelling abilities, time considerations and place.
For example, when someone is talking about not bringing something on holiday because it wouldn’t fit in the suitcase either sentence: It was too big, it wouldn’t fit, or it was too small, it wouldn’t fit, are understandable, though ambigious, but because of context, constraints and, prior knowledge about size; humans understand that either the something was too big to fit in the suitcase or the suitcase was too small to fit the something in, which is why the something isn’t currently in the room. These complexities are beyond all AI including ChatGPT.
However, ChatGPT gives the impression that it does understand, because it produces sentences which look like the real thing and which us humans interpret as meaningful. And as human beings, our desire for meaning, especially in human to human connection, is so great that we often project our own intelligence and wonderfulness into the space between us and the other, so much so that we just don’t see when that other is phoning it in. Instead we give them all the credit for making us feel the way we do. And, believe you me, ChatGPT is seriously phoning it in. It is code. It cannot do anything else.
ChatGPT doesn’t do complex semiotic nor semantic reasoning, instead it does pattern matching, but on an epic scale. To give you an idea of that scale: ChatGPT has 175 billion parameters and 175 billion links (synapses). Chuan Li, at Lamda labs, says that a typical human brain has over 100 trillion synapses, which is only one thousand times bigger than the GPT-3 175B model, but GPT keeps increasing in size with more and more training data, until one day it will hit the 100 trillion mark.
By dint of its sheer size, after a few hundred billion words of text trained through its nested neural networks, some randomness to represent creativity, and some tweaks to get the desired results, ChatGPT is greater than the sum of its parts. It gets results. Although already at this point, no one can say exactly how it works because it is so big. Yes, they have created a monster.
ChatGPT is just trying to write expected text, rather like a sophisticated version of predicted text, which puts me in mind of those autofill games on social media way which celebrate the unexpected such as writing your own epitaph:
Here lies [your name], [preferred pronoun] was ...[let your predictive text finish the rest].
Here lies Ruth, she was a bit better than the other one.
Here lies Ruth, she was right.
Try it out!
Predictive text basically uses context-sensitive probability. So, it looks at what you have put in before, and looks at the words which should come next often based on a probabilitic search such as k nearest neighbours, an algorithm designed in 1951. And it works good enough for our phones, as we are always there supervising.
Now most of us would think that the highest value – the most probable – word should always be chosen, for expected results, but apparently ChatGPT developers have found that it makes the text flat, that is to say not very creative, so instead they have coded it to make a random choice of word, which probabilistically could come next, using a temperature parameter of 0.8. Why 0.8 you may ask? Well, based on trial and error, they decided that 0.8 gave the best result. (And this in itself, is an example of deep learning. A human looks at what a neural net is doing, unsupervised, and supervises it for a bit.)
At this point, I want to know exactly what text, books and specific sites did it use. Any Beowulf? Shakespeare? Even though ChatGPT is OpenAI, it is a bit secretive (I can’t find the exact Wired article to link to right now but the gist was that it used to be quite open and now isn’t due to the competitive advantage). I hope it wasn’t trained on anything Boris Johnson wrote, or the Daily Mail.
How ChatGPT is found to work
Stephen Wolfram in his indepth explanation says that ChatGPT represents each word of a sentence as a series of tokens, which is a linguistic unit. Each token is not necessarily a whole word, it might be a chunk of word such as a prefix, or a suffix, because that makes it easier to model compound words, non-English words, and made up words. Then it stores a token as an array of numbers. Then to create the next word, it takes all of the text it has got so far and generates an embedding vector of arrays of numbers to represent it.
Then, it takes the last part of this array and generates from it an array of about 50,000 values that turn into probabilities for different possible next tokens. Then, it uses what is known as a transformer, which focuses its ‘attention’ on a individual bit of the array which represents the different possible next tokens, and Wolfram says:
It’s all pretty complicated—and reminiscent of typical large hard-to-understand engineering systems, or, for that matter, biological systems.
This reminds me of when you are trying to understand a bit of code and its’ author instead of documenting it properly has just written #tricky stuff after having hacked away, finally got it to work, and has no idea now and cannot write any comments in the code to enlighten anyone else.
So, what happens next is ChatGPT chops up this attention array up into little bits which they call attention heads, and then using probability it calculates the next word, by looking across all the text, paying attention to all the words in case the word needs to refer to other words, say a verb that is referring to a noun a few words words earlier in the sentence. Wolfram says:
And, yes, we don’t know any particular reason why it’s a good idea to split up the embedding vector, or what the different parts of it “mean”; this is just one of those things that’s been “found to work”…
Then this new embedding vector is passed through another neural net (or layer) as each part of the process is represented by a separate neural net, though at this specific attention transformer point in the process, Wolfram says:
It’s hard to get a handle on what this layer is doing.
Then it is passed through more neural nets/layers/attention blocks a total of 12 times for GPT-2; and 96 times for GPT-3, until finally the last embedding vector is then used to produce a list of probabilities for what token should come next.
There is no feedback it is just these vectors being pushed through all the layers up on layers of neural nets upon neural nets and then as it gets ready to generate a new token or word, it reads (i.e. takes as input) everything that has come before it including the stuff it has written itself.
Imagine writing an essay like that – everytime you want to write a new word, you reread the whole thing. It is labour intensive, not human-like at all, as we are cognitive misers. ChatGPT sounds like Sisyphus.
TL;DR chit chat chitty chitty chit ChatGPT
In a nutshell no one knows how ChatGPT works, not even the people who created it. It is immense. But work it does. And contrary to the hype, I don’t think that it will change the face of humanity, but I have heard that it will write all those boring emails no one want to write. Though, perhaps no one should be writing the emails at all because that is just not progress and not the reason why AI was developed.
But ChatGPT is well-named, for the chit chat chitty chitty chit chat, it has created around it.
I laughed out loud at Grady Booch’s response and he would know, he has been coding forever. I don’t blame him as we’ve been hearing the blethering hyperbole since 1958, when the very first neural network Perceptron was hailed by The New York Times as:
The embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.”
It didn’t really work. It just looked like it did.
We’ve come a long way since then, but we still have a long way to go.
ChatGPT is currently telling us more about ourselves than it is about AI. But then that is the subject for another blog.