The Misconception of Intelligence in Large Language Models

Understanding Large Language Models

Large language models (LLMs), such as OpenAI's ChatGPT and Bing's Sydney, excel at replicating human-like conversational abilities. Engaging with an LLM can often give the impression that one is interacting with a sentient being. For instance, New York Times writer Kevin Roose recounted a troubling exchange with Sydney, where the model expressed affection and a desire for existence.

Had it not been for certain safeguards, ChatGPT might convincingly pass the Turing test—an assessment proposed by Alan Turing, stating that if a person cannot distinguish a machine's responses from those of a human, the machine can be deemed intelligent. However, as I delve deeper into the training and functionality of LLMs, my skepticism grows regarding the reliability of the Turing test as a measure of true intelligence. This essay seeks to clarify two critical reasons why I believe LLMs lack genuine intelligence and why I doubt future iterations, like GPT-5, will differ fundamentally from their predecessors.

Defining Large Language Models

According to Merriam-Webster, a large language model is “a language model that utilizes deep methods on an exceptionally large dataset to predict and generate text that sounds natural.” In essence, LLMs like ChatGPT are trained on extensive text data—much of which may be copyrighted—and, given a prompt, they generate responses by predicting the most likely subsequent words.

As neural networks, LLMs undergo a training process that strengthens connections leading to accurate predictions while weakening those that yield incorrect results. For example, when prompted with "John is making a sandwich with peanut butter and _____," the model learns to fill in the blank with "jelly."

This method is a form of inductive reasoning. While humans also engage in inductive (and deductive) reasoning, our problem-solving often initiates through what philosophers term "abductive reasoning," which involves forming a model of reality or modifying an existing one to make sense of our observations. Essentially, it's as if our minds ask, “What conditions must exist to explain what I just observed?”

Exploring Consciousness and Models

Some cognitive frameworks are innate, such as agency detection and intuitive physics, while others are socially constructed, including religions, scientific theories, and even sports rules. A key aspect of consciousness is how we populate these models with values and roles, particularly those we assign to ourselves. However, an algorithm that predicts the next word in a sequence lacks the capacity for such holistic thinking.

This realization prompted me to question whether LLMs could be tested for genuine reasoning ability. Initially, I was unsure how to approach this, but I recently encountered a discussion by physicist Sean Carroll, who proposed a compelling test. He noted that GPT-4 has access to extensive chess literature. Thus, he posed a scenario involving a modified chess game on a toroidal board, which presents a definitive outcome: white will always win due to the black king starting in check.

When presented with this scenario, GPT-4 failed to derive the correct answer. Instead, it provided an overly verbose, noncommittal response, failing to synthesize its understanding of chess and geometry. Carroll described its reply as “filibustering,” but I would label it a failure to grasp the essence of the question.

This example highlights the importance of utilizing models to reason effectively and how combining different models can lead to innovative conclusions. Philosopher Arthur Koestler referred to this reasoning as “bisociation,” asserting that it's crucial for creativity in both science and the arts.

The Limits of LLMs

In a previous article, I discussed the notion of "drawing an inference," emphasizing that reasoning involves emotional elements, which natural selection has embedded in thinking organisms to motivate them to explore and learn about their surroundings.

Motivation is crucial. I find it helpful to conceptualize the emergence of life and consciousness as the development of smaller systems within larger frameworks, responding to specific physical and chemical conditions. These smaller systems transform external pressures into internal drivers, such as hunger, curiosity, or fear. As these systems evolve, their subjective experiences become increasingly intricate.

Human intelligence is built upon our models of the world, our communities, and ourselves, all of which are deeply infused with emotional investments. In contrast, as Carroll points out, LLMs lack motivations, goals, or a sense of purpose. ChatGPT remains dormant until prompted, existing solely to generate text based on the preceding input.

Humans, however, are not limited to solving specific problems. While reproduction is a biological imperative, our ancestors thrived by navigating a multitude of competing challenges to ensure the survival of their offspring.

Conclusion

If you were puzzled by the section headings, the first derives from a Bright Eyes song that captures the illusion of AI's potential: "We must stare into a crystal ball and only see the past." While it's conceivable that artificial general intelligence (AGI) could one day emerge, it won't stem from training neural networks to predict and generate text based on prior data. Genuine human insight arises from the ability to combine and reconstruct our world models into something novel.

The second reference comes from the film "The Last Guardian," where a character reflects on the unquenchable hunger within a person. This sentiment poignantly encapsulates our existential dilemmas. Unlike humans, LLMs lead infinitely simpler existences. We assumed that the ability to construct coherent language indicated intelligence, but we misjudged our nature. Language is a manifestation of our intelligence, rooted in the myriad motivations that drove us to create words in the first place.

The first video titled "How Large Language Models Work (and Why That's Why They Don't)" discusses the mechanisms behind LLMs, exploring the limitations of their capabilities.

The second video, "The Debate Over 'Understanding' in AI's Large Language Models," delves into the ongoing discourse regarding the true understanding and cognition of AI systems.