Why GPT-3 Can Understand Things

No I'm not going to argue that GPT-3 understands things in the same way as a human; that would obviously be absurd, since it has no senses and no experience of any world outside of text. I'll simply argue that "understanding" is a good word for whatever it is that GPT-3 does. First, we'll have to explain what it means to understand something, thereby solving a philosophical problem that philosophers have been trying to solve for centuries.[1] But first we need to talk about parallel universes SHRDLU.


SHRDLU (an abbreviation of nothing in particular[2]) was one of the first natural language understanding programs written by Terry Winograd (after whom Winograd schemas are named) in Lisp back in 1968. It had many advanced features like coreference resolution, and was capable of responding to complex commands in its "blocks world"[3] consisting of colored geometric shapes in various configurations.

An example of SHRDLU output:


Computer: OK. (does it)





Computer: OK. (does it)



I think there is clearly a sense in which SHRDLU can be said to understand language related to its blocks world. When you give it a command, it can understand your intent and carry it out to the best of its ability. Obviously everything is just dumb rules under the hood, but in common parlance I think we could sensibly ask questions like "does SHRDLU understand what 'the green thing' refers to?" without expecting an answer like "No, because SHRDLU is just a collection of rules and can't actually understand anything". I'm claiming that GPT-3 understands things in the same sense that SHRDLU does, nothing more, nothing less.

You might be feeling disappointed right now. "But that's not an interesting claim! You're just saying that GPT-3 figuratively understands things. Everybody knows that!" I would reply in two ways: first, not everybody knows that. Some think that GPT-3 is not capable of tracking the kinds of world relationships that SHRDLU does above, like "x is inside of y". Gary Marcus claims as much in his article "Experiments testing GPT-3's ability at commonsense reasoning: results."[5] However, critics have pointed out that these results suffer from poor prompting, and with a better prompting strategy GPT-3 is perfectly capable of tracking these things within the limits of its context window.[6][7][8]

Secondly, my use of the word "understand" is neither figurative nor metaphorical. I think the sense in which GPT-3 and SHRDLU can be said to understand things is more or less the same sense in which humans can be said to understand things. In other words, "understand" refers to the same phenomenon both in the case of GPT-3's understanding and human understanding.

Note that I'm not claiming that GPT-3 understanding and human understanding are equivalent. There are clear differences: the most obvious one is that human understanding is always in some way ultimately grounded in sense experience (although with some caveats). I'm merely claiming that the word "understand" properly refers to a specific abstract property that GPT-3 understanding and human understanding have in common. So what is that property?


Let's go through some paradigmatic cases to help clarify our intuition about what it means to understand what a word or concept means. In particular, we want to distinguish "understanding" from mere "memorization" or "knowing". When we say that GPT-3 understands something, we are implicitly claiming that GPT-3 is not merely dumbly regurgitating facts from a memorized database.

If a parrot has memorized that it should answer "four" when it hears the words "What is two plus two" in order to get a cashew nut, we don't say that it understands addition. So what do we mean when we say that somebody understands addition? Well, first of all, they can do it in multiple contexts. Not only can they answer "What is two plus two", they can also answer "What is two plus three", "What is 324 + 700" and so on. In other words, they have some rule or algorithm in mind that they can use to correctly answer any question of the form "What is x plus y?" You could also phrase this as them having a statistical model in mind that they can use to predict with (almost) perfect confidence what the answer to any addition problem is.

A common refrain against the concept of GPT-3 understanding anything is that "it's just statistics". This is either vacuous or mistaken. Yes, there is a general sense in which what GPT-3 does is "just" statistical inference, because the task of predicting the next word in a text is a statistical inference task! This is vacuous as an argument because it would apply equally well to a system that uses Amazon's mechanical turk to get a person to do the same task. Would we say that the person doesn't actually understand the text, because they are just doing statistical inference, to predict what the next word will be? If you are defining statistical inference this broadly, then the human brain is also "just statistics".

When people use the "just statistics" argument, they aren't generally referring to the above, vacuous form of argumentation. Rather, they are making a stronger claim: they believe that GPT-3 uses only surface-level statistical features to make its predictions of what the next word will be. That is to say, their mental model of GPT-3 appears to be something like "it looks at the words that came before, then looks at its memorized corpus and picks the most common continuation to that sequence of words." (Of course sampling is separate from determining probability distributions and GPT-3's elements are BPEs, not words, but you get the idea.) This model is mistaken for two reasons: first, it ignores the way GPT-3 makes use of embeddings. Second, this is basically just a description of how Markov Chains work, and we know those can't produce the same quality of output! We sure tried in the nineties.[9]

Markov models for language generation traditionally base their probabilities only on the actual words that appear in the text, without any encoding of semantic relationships between words (like "a dog is a kind of mammal" or "a king is a male royal leader"). However, neural net models generally do encode such relationships, in the form of word embeddings, and GPT-3 is no different in this regard.[10][11]

You can intuitively think of word embeddings as high-dimensional spaces where points correspond to words and directions correspond to relationships between words. So for example, you can think of the direction between the words "man" and "woman" as defining a kind of "masculinity-femininity axis", and do things like "start from the word queen and go in the masculine direction" to find the word "king". (This is represented mathematically by the vector operation "queen - woman + man = king") As you can see, word embeddings alone allow us to represent fairly complicated semantic relationships, so it's not unnatural to say that, in some sense, GPT-3 understands that "a dog is a small canine often kept as a pet". It is not only capable of dumbly regurgigating this fact, but actually using it when "thinking" of an appropriate continuation to a sentence.

Returning to the example of addition, probably just being able to answer arbitrary addition problems and being able to apply the model of addition in the real world is enough to capture what most people would mean by "understanding addition". But of course there is also a deeper sense in which to understand addition, the way mathematicians understand it. Not only can they perform addition like everybody else, they can derive the laws of addition from deeper axioms (such as the Peano axioms). We could summarize this difference as "understanding how" vs. "understanding why": whereas most people understand the "how" of addition, generally only mathematicians and people in related fields understand the "why" of addition. "Understanding why" represents a sort of deeper understanding of the phenomenon under study.

Feynman famously had this to say on why questions:

"But the problem, you see, when you ask why something happens, how does a person answer why something happens? For example, Aunt Minnie is in the hospital. Why? Because she went out, slipped on the ice, and broke her hip. That satisfies people. It satisfies, but it wouldn't satisfy someone who came from another planet and who knew nothing about why when you break your hip do you go to the hospital. How do you get to the hospital when the hip is broken? Well, because her husband, seeing that her hip was broken, called the hospital up and sent somebody to get her. All that is understood by people. And when you explain a why, you have to be in some framework that you allow something to be true. Otherwise, you're perpetually asking why. Why did the husband call up the hospital? Because the husband is interested in his wife's welfare. Not always, some husbands aren't interested in their wives' welfare when they're drunk, and they're angry.

"... So I am not going to be able to give you an answer to why magnets attract each other except to tell you that they do. And to tell you that that's one of the elements in the world - there are electrical forces, magnetic forces, gravitational forces, and others, and those are some of the parts. If you were a student, I could go further. I could tell you that the magnetic forces are related to the electrical forces very intimately, that the relationship between the gravity forces and electrical forces remains unknown, and so on. But I really can't do a good job, any job, of explaining magnetic force in terms of something else you're more familiar with, because I don't understand it in terms of anything else that you're more familiar with."[12]

What Feynman is getting at here is that an answer to a why question is a kind of mental model that can connect disparate phenomena together, and in particular there is a sort of hierarchical structure where deeper models explain the observations in shallow models. In other words, the scientific process starts with some collection of observations or basic data, builds models to explain and predict parts of that data, then build deeper models to explain and connect the shallow models etc. The key point is that you can derive or recover the shallower models (and ultimately, the original data) from the deeper ones (in a lossy way).

In other words, we can think of scientific models (and mental models more generally) as being like hierarchies of lossy compressions of some ground data provided by observations and experiments. You create the model by compression, and you make predictions by decompression. Of course, not all compressions can be used to predict future observations. We can compress all of Euclidean geometry into five axioms and some general deduction rules, but we can't (generally speaking) meaningfully "predict" interesting theorems in geometry from simply looking at these axioms.

Any model that allows you to predict data also allows you to compress it, because if you have a set of data and a model that can predict all the values in the data set from a subset of the data, you can reduce your data to just that subset -- after all, your predictive model can recover the rest of the data from that subset. Thus we say that "prediction is compression".

So "understanding how" comes down to prediction and predictive models, while "understanding why" comes down to compression and hierarchies of compressive models. But because prediction is compression, we can summarize both as forms of compression. Viewed mathematically, the core component of what we call "understanding" is simply data compression. This is the abstract property that GPT-3 understanding and human understanding have in common which I referred to earlier.

GZip is an algorithm that can compress any data, but its compression is far from optimal; it is a "shallow model" that only makes use of the superficial, surface features of the data given to it. The "depth" of a model is roughly given by its effectiveness in compressing larger and larger amounts of more and more diverse data. Maxwell's equations allow us to compress practically all electrical and magnetic observations (i.e. electrical and magnetical data sets) to just four equations. Thus they represent a deep, but domain-specific compression.


Why is there such vehement disagreement on the point of whether GPT-3 can understand anything? I think it comes down to different thinking styles. It is often not realized how different people's ways of thinking really are. It is not quite accurate to say that all people have the same cognitive processes; rather, thinking styles are constructed over a lifetime.[13]

My headcanon is that thinking styles can be separated into three main types, namely visual, verbal and non-symbolic, and that each person has a primary thinking style, an auxiliary thinking style, and a weak thinking style. For example, my primary is verbal, my auxiliary is non-symbolic, and my weakness is visual thinking.[14]

Verbal and visual thinking are mostly self-explanatory (unless you are an aphantasiac, that is, or the verbal equivalent, which also exists[15]). How about non-symbolic thinking? The best way I can explain it is that it similar to the Feynman algorithm: "Just write down the problem, think really hard, and write down the solution." What is this thinking? Nothing in particular. Just focusing on an abstract concept directly in your mind. There are no real "qualia" associated with this kind of thinking. It's more like a bare "knowing" without experience.[16]

Just like you can see a tree and simply know "This is a tree"; now close your eyes, but keep the "knowing". This is the crux of non-symbolic thinking. However, it's not simply a passive knowing. You can do certain kinds of, "operations", for lack of a better term, to manipulate your focus on the abstract concept, and thereby develop new insights.

Non-symbolic thinking as a primary mode is possible[17], and it does not limit you to only manual vocations. One non-symbolic primary I know works as a data scientist. I conceive of GPT-3's "understanding" as being very similar to this non-symbolic thinking. A sort of bare awareness of abstract relationships. Because this sort of thinking is familiar to me, it doesn't appear intuitively strange to me to call the thing that GPT-3 does understanding.

But consider somebody whose primary mode is visualization and weakness non-symbolic thinking. Such people understand the world in a strongly visual way. For example, when they hear the sentence "I fell of the roof" and understand that this probably means the speaker is or was in pain, they don't merely understand it as a direct knowing (like a non-symbolic thinker) or verbally (i.e. thinking something like "That must have hurt"), they actually visualize falling off the roof and the pain that would cause. And to them, it's probably deeply strange that you could understand the sentence in any other way! To them, being able to do these sorts of visualizations simply is understanding, and they may not even have realized that there are other ways to understand things. So of course to them it would appear obvious that GPT-3 can't understand anything, since it can't visualize anything, or simulate any other sort of sense data.


Research has indicated that convolutional neural networks, like biological brains, tend to develop hierarchical layers of "semantic neurons" (or feature neurons) which correspond to specific features of the input data such as "curve in a specific orientation", "fur", "whiskers", "cat" etc.[18] Research by OpenAI shows that similar neurons spontaneously appear in language models as well[19] (as they likely do in all neural networks trained on sufficiently complex data with a sufficiently general learning algorithm). These semantic neurons can be thought of as feature-based "compressions" of the input data.

A view has been emerging that large neural nets can be viewed as ensembles or collectives of smaller nets performing more specialized functions (the semantic neurons mentioned above are simply one example of this)[20][21] Based on this hypothesis, here's my intuitive model of how GPT-class models may work. I am by no means an expert, so take this section with a grain of salt.

One of the common misconceptions about GPT-3 that we covered above was that it is only regurgitating word sequences it has memorized. To some degree this may be true. However, certainly on a sentence-level the large majority of its sentences seem to be unique. In other words, it doesn't seem to rely on regurgitation particularly more than the average human writer.[22] A larger point is that GPT-3 simply had too much training data to memorize all of it in its weights.[23]

Still, it's clear that it would be difficult for GPT-3 to induce the in-depth causal-spatial-temporal structure of the world from simply reading text (and in fact it tends to perform poorly at tasks which require spatial reasoning). Perhaps GPT-3 is capable of fairly complicated inferences, but the text generated by its predecessors is already fairly high-quality despite their more obviously limited resources. So how do GPT architectures generally work?

I believe the answer may be that they do employ Markov chains. I envision an architecture where the network has an absurdly large collection of Markov chains with a relatively long sequence-length (improving coherence) together with some relatively simple detectors and rules which help it to decide which Markov chain should be active at any point in the sequence.[24] We already know that even simple RNNs can keep track of things like "is this dialogue or prose"; I presume that GPT-3 can track things like the different characters in a scene, who is speaking to whom, what the subject of discussion is etc.

However, this of course is not a full explanation. If this was the only trick GPT architectures used, it would be instantly noticeable because the sequences produced could very often be found in the training data. So in addition to the memorized Markov chains, I believe that GPT models learn a large corpus of "translation functions" that allow them to translate between the outputs of different Markov chains. This can be used prosaically to translate between languages like English and German, but also to create accents, change tone and voice, and even to translate English into code and back (machine translation is, after all, basically neural style transfer for text). I believe the translation functions are even used for metaphor discovery, as in these examples.

Factual question-answering is likely done by a Watson-like setup of competing answerers, but since these answerers can be considered as extremely specific Markov chain generators, this is more or less a consequence of the preceding points.


It may seem like I'm destroying the magic of GPT-3 by attempting a reductive explanation like this. However, I believe all of this is not too different from what the language centers of the human brain do.

You might object that artificial neural networks are nothing like human brains. Even famous researcher Yann LeCun said as much in 2015: "while Deep Learning gets an inspiration from biology, it’s very, very far from what the brain actually does".[25] In reality the matter is quite nuanced, however. Saying that ANNs do not resemble human brains is a lot like claiming that brains aren't computers. This statement is either vacuously true or vacuously false. If by "computer" you mean a "von Neumann machine," then of course the brain is not a computer. If you use "computer" in the sense this word is used in the term "computer science", then of course the brain is just a massively parallel computer (i.e. system that performs computations).[26]

Similarly, an ANN is obviously not an accurate simulation of neuron behaviour: there's no model of neurophysiology, neurochemistry, frequency-based signalling etc. But it cannot be denied that the basic ideas of ANNs were derived from simple models of biological neural networks, and they do seem to plausibly belong to the same "class" of computers. Certainly ANNs resemble brains much more than they do GPUs or cellular automata.

Not only do ANNs in fact resemble biological neural networks in meaningful ways, results from analysis of deep learning models have actually inspired modern neuroscience. The discovery in the early 2010s that CNNs are effective models for classifying image data prompted research in finding analogous systems in human brains, leading to natural image statistics and the discovery of a hierarchy of neural representations along the ventral stream.[27][28]

Therefore, it seems plausible to me that Transformer-style models may well resemble the networks in the language centers of the brain. In other words, GPT-3 is not like a human brain. It is just like a part of one.