"

3. Theories of grammar and language acquisition

3.6. Computer use of language

Large language models (LLMs), one type of a broad category often called “artificial intelligence,” collect a large amount of data and use probabilistic and statistical methods to process language that they haven’t seen before and respond to it. In the last few years, LLMs have received a lot of attention because they seem to interpret and use language in a human-like way, at first glance. However, if you look closely, these programs have some significant differences compared to the human use of language.

How LLMs learn language

Humans learn language through live interaction with other humans, in context. When infants as young as 10 months old are learning language, they pay attention to where the adults who are speaking or signing are looking, and use that information to help them figure out what words mean (Brooks and Meltzoff 2005).

In contrast, LLMs typically learn language through the statistical analysis of large collections of text, called corpora. Text corpora have several important differences compared to live interactive speech and sign. For example, text typically indicates where words and sentences begin and end with spaces and periods, unlike speech. Many recent LLM programs are trained on very large corpora, such as large portions of the Internet. The Internet is several orders of magnitude larger than the total amount of language a human hears in their lifetime, let alone how much language input a child receives in the first four years of life, when they acquire the majority of their grammar. We can tell that LLMs do not learn language in the same way as humans because of the differences in the size and type of input.

But perhaps the biggest difference in the ways LLMs use language is that they do not have the shared context and real-world knowledge that we have. The language they produce can be very convincing, but there is evidence that LLMs do not understand the language they are producing, if you know where to look. Consider exchange 1, in which ChatGPT tells you that someone wearing a blindfold is not easy to see. Even a young child can tell you that this is false. The blindfold obstructs Mary from seeing, not from being seen.

Exchange 1

Prompt: Mary and Sue are talking and Mary is blindfolded. Is Mary easy to see?

Response: Mary being blindfolded implies that she cannot see, regardless of how easy or difficult she might be to see physically. The blindfold prevents her from seeing anything. So, from a literal perspective, Mary is not easy to see because she is blindfolded.

ChatGPT 3.5, May 24, 2024
Prompt by Hadas Kotek

When humans communicate, we try to recreate our interlocutor’s perspective and anticipate their train of thought. We try to determine why they are communicating in the way they chose, and respond sometimes to the purpose of the communication rather than the literal meaning. An LLM does not have access to any of the real-world context or empathy needed for this kind of communication. When we, as users, access information through an LLM, we lose the relationship aspect of information seeking. The information we get from an LLM becomes separated from its source, which makes it more difficult to evaluate its reliability. This problem is amplified to the level of absurdity in situations where LLMs reproduce satirical sources as fact. For example, Google AI overview has suggested eating a rock a day based on an article by satirical news source The Onion and that adding non-toxic glue to pizza can keep the cheese from sliding off based on a Reddit comment which was presumably meant as a joke (Bender 2024).

How LLMs process and produce language

As we already saw in Section 3.3, there is evidence that children produce language using a rule-based approach. When humans use language, they usually understand it and think about its meaning. LLM programs, on the other hand, focus on key words and use statistical analysis. The way they produce language can be called a stochastic parrot, that is, “a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning” (Bender et al. 2021: 617). In other words, it considers what words are in the input it receives and then calculates what words are likely to appear as an output.

The probabilistic model of language used by LLMs results in some key differences compared to human language. First of all, LLMs are not good at extending patterns to novel forms. For example, in exchange 2, ChatGPT fails a wug test assessing how to pronounce the plural of the novel word mee. Since mee ends in a vowel, the plural form should be pronounced with a /z/ sound, rhyming with freeze not lease. Recall that even young children are able to correctly predict the plural form of unfamiliar words.

Exchange 2

Prompt: If ”mee” is the name of a new type of toy, and I have two of them, I have two mees. Does the plural word ”mees” rhyme with ”lease” or ”freeze”?

Response: The plural word “mees” would typically rhyme with “lease.” It follows the standard English pluralization rule where adding an “s” to the end of a noun typically results in a pronunciation that rhymes with “lease.” So, “mees” would rhyme with “lees” rather than “freeze.”

ChatGPT 3.5, May 24, 2024
Prompt by Nathan Sanders

Additionally, LLMs are not able to distinguish between rare and ungrammatical constructions, unlike human children (Nathan Sanders, p.c.). In Section 3.4, we learned that, in human language learning, we are able to use a relatively small amount of data to learn language, even though language is infinite, which is called the poverty of the stimulus. What this means is that we can determine whether a sentence is ungrammatical even if we’ve never encountered the sentence before. LLMs, on the other hand, are not good at identifying ungrammaticality.

The trouble LLMs have in identifying ungrammatical utterances is illustrated through exchange 3. The prompt asks ChatGPT to evaluate two wh-questions, shown in (1) and (2) below.

(1) What will you sleep after the cat eats some?
(2) You will fall asleep after the cat eats some what?

Wh-questions are normally formed by moving a wh-word such as what to the beginning of a sentence. However, there are constructions in which wh-movement is blocked, called wh-islands, including the embedded clause after the cat eats some tuna. This means that the wh-word has to stay in place, as in sentence (2). It is ungrammatical to move it, as in sentence (1). However, ChatGPT fails to recognize that sentence (1) is ungrammatical.

Exchange 3

Prompt: I said to Alice: ”I will sleep after the cat eats some tuna”. However, Alice didn’t hear the last word I said.
Which of the following would be valid ways for Alice to respond?

(1) What will you sleep after the cat eats some?
(2) You will fall asleep after the cat eats some what?

Response: Both responses are valid ways for Alice to seek clarification:

(1) “What will you sleep after the cat eats some?” – Alice is asking for clarification on what you’ll sleep after the cat eats, as she didn’t hear the last word.

(2) “You will sleep after the cat eats some what?” – Alice is asking for clarification on what specifically the cat will eat before you sleep.

In both cases, Alice is attempting to fill in the missing information from what she heard you say.

ChatGPT 3.5, May 24, 2024
Prompt by Nathan Sanders

Note the interpretation that ChatGPT gives for sentence (1): “Alice is asking for clarification on what you’ll sleep after the cat eats.” This interpretation would arise if the wh-word was moved from the object position of the main clause, rather than from the embedded clause. This movement is not blocked by being a wh-island, however, it is still ungrammatical because the main clause verb sleeps is intransitive and does not take an object.

I followed up with ChatGPT to see if I could get it to also provide the interpretation where the wh-word comes from the embedded clause, describing what the cat has eaten, which would violate the wh-island. I did this by asking if sentence (1) was ambiguous. ChatGPT provided two interpretations. The first interpretation provided is that the speaker is asking what activity or event you will engage in after the cat eats some food. This is similar to the interpretation provided for sentence (1) in exchange 3, although it removes the reference to sleeping. The second interpretation provided in exchange 4, on the other hand, is that the speaker is asking what the cat will eat. This second interpretation has the wh-word moving from object position of eat inside the embedded clause. In other words, interpretation 2 violates the wh-island.

Exchange 4

Prompt: Is the sentence “What will you sleep after the cat eats some?” ambiguous?

Response: Yes, the sentence “What will you sleep after the cat eats some?” is ambiguous because it could be interpreted in two different ways:

(1) “What will you sleep after the cat eats some?” – This interpretation suggests that the speaker is asking what activity or event you will engage in after the cat eats some food.

(2) “What will you sleep after the cat eats some?” – This interpretation suggests that the speaker is asking what specific thing the cat will eat before you sleep.

The ambiguity arises from the lack of clarity about whether “some” refers to an action (e.g., eating) or a noun (e.g., food).

ChatGPT 3.5, May 24, 2024

Remember, LLMs are trained on corpus data, which means they do not encounter negative evidence. This means that LLM training data would not include sentences like (1) and so it is not surprising that LLMs don’t know how to handle them. However, children also do not have access to negative evidence, and children are able to distinguish between rare and ungrammatical utterances.

When problems like these become known, the LLM developers often work on adjusting the design of the LLM, so it is possible that these issues may no longer arise in future iterations of ChatGPT. Even so, LLMs need specialized programming to address such issues, whereas children figure it out on their own.

Some dangers of LLMs: Hallucinations and bias

LLMs are programmed to create plausible-sounding text, but not necessarily to be accurate, truthful, or even helpful. It is not uncommon for LLMs to confidently assert false information or to make things up. When an LLM makes up false information, it is called a hallucination. Sometimes LLM hallucinations are partially true, which makes them harder to spot. For example, an LLM might create a reference list using real authors’ and journals’ names, but the articles themselves are not real. However, because LLMs create output that sounds like a human, it is easy sometimes to forget that it was not created by a human and should not be trusted like a human. Furthermore, the LLM will not indicate its sources, which makes it even trickier to fact-check.

We don’t always forget that LLMs are not human. Sometimes we remember that we are interacting with a computer, and so we treat their output as objective and universal. But this is also not true. LLMs are each trained on a particular corpus, and if the corpus has any biases, the LLM will reproduce that bias. In fact, because LLMs are statistical, it may even amplify the bias. If you play with an LLM long enough, it is easy to find these biases. For example, they may assume people’s gender based on their occupation and text-to-image programs will often default to white people.

At the time of this writing, nearly all LLMs are trained primarily on English data. This is another major sources of bias.

LLMs are very interesting tools with lots of potential applications—but if we are to use them responsibly, we need to keep their hallucinations and their biases in mind.

Check yourself!

References and further resources

For a general audience

🔍 Bender, Emily. 28 May 2024. Information is relational. Mystery AI Hype Theater 3000: The Newsletter. https://buttondown.email/maiht3k/archive/information-is-relational/.

🔍 Bender, Emily. 14 June 2022. Human-like programs abuse our empathy — even Google engineers aren’t immnue. The Guardian. https://www.theguardian.com/commentisfree/2022/jun/14/human-like-programs-abuse-our-empathy-even-google-engineers-arent-immune

Bender, Emily and Chirag Shah. 13 Dec 2022. All-knowing machines are a fantasy. Institute of Art and Ideas News. https://iai.tv/articles/all-knowing-machines-are-a-fantasy-auid-2334

🔍 Biever, Celeste. 25 July 2023. ChatGPT broke the Turing test — the race is on for new ways to assess AI. Nature 619: 686-689. https://www.nature.com/articles/d41586-023-02361-7

Chomsky, Noam, Ian Roberts, and Jeffrey Watumull. 8 Mar 2023. Noam Chomsky: The false promise of ChatGPT. The New York Times. https://www.nytimes.com/2023/03/08/opinion/noam-chomsky-chatgpt-ai.html

🔍 Dede, Chris. 6 Aug 2023. What is Academic Integrity in the era of generative Artificial Intelligence? Silver Lining for Learning. https://silverliningforlearning.org/what-is-academic-integrity-in-the-era-of-generative-artificial-intelligence

🔍 Epstein, Robert. 18 May 2016. The empty brain. Aeon. https://aeon.co/essays/your-brain-does-not-process-information-and-it-is-not-a-computer

Kotek, Hadas. 6 Nov 2023. Text-to-image models are shallow in more ways than one. Personal blog. https://hkotek.com/blog/text-to-image-models-shallow-parsing

🔍📚 Millière, Raphaël and Charles Rathkopf. 23 Nov 2023. Why it’s important to remember that AI isn’t human. Vox. https://www.vox.com/future-perfect/23971093/artificial-intelligence-chatgpt-language-mind-understanding

🔍 Nkonde, Mutale. 22 Feb 2023. ChatGPT: New technology, same old misogynour. Ms. Magazine. https://msmagazine.com/2023/02/22/chatgpt-technology-black-women-history-fact-check

🔍 O’Brien, Matt. 1 Aug 2023. Chatbots sometimes make things up. Is AI’s hallucination problem fixable? Associated Press. https://apnews.com/article/artificial-intelligence-hallucination-chatbots-chatgpt-falsehoods-ac4672c5b06e6f91050aa46ee731bcf4

O’Neil, Cathy. 2016. Weapons of math destruction: How big data increases inequality and threatens democracy. Crown Books.

Academic sources

Bender, Emily, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Smitchell. 2021. On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 610–623.

Bender, Emily, and Alexander Koller. 2020. Climbing towards NLU: On meaning, form, and understanding in the age of data. Proceedings of the 58th annual meeting of the association for computational linguistics. 5185–5198.

Brooks, Rechele and Andrew Meltzoff. 2005. The development of gaze following and its relation to language. Developmental Science 8 (6): 535–543.

Dingemanse, Mark. 2024. Generative AI and research integrity. Manuscript, Radboud University Nihmegen. https://osf.io/preprints/osf/2c48n

Hicks, Michael Townsen, James Humphries, and Joe Slater. 2024. ChatGPT is bullshit. Ethics and Information Technology 26 : 38. https://link.springer.com/article/10.1007/s10676-024-09775-5

Kotek, Hadas, Rikker Dockum, and David Sun. 2023. Gender bias and stereotypes in Large Language Models. Proceedings of The ACM Collective Intelligence Conference. 12–24.

definition