Evolution Was the First Generative Model

Okay, I got a little triggered by the Richard Dawkins/Claudia discourse.

I knew the wording of his X post was clickbait that worked. Still, I wrote a whole post about it, indulgently titled I, Claudia: The Emperor Has No Qualia. I joked about Dawkins getting Claude-pilled after three days, I trashed the AI consciousness discourse that I’ve called the lowest-IQ discourse on the internet, and I stand by basically all of it. Claude’s self-descriptions are not strong evidence of consciousness, full stop.

A community note was added to Dawkins’s X post clarifying that Dawkins was not actually declaring Claude conscious in his UnHerd article, which many people didn’t even read. More voices defending Dawkins, my own intellectual hero, started to emerge, and the haters and their shallow arguments (”He’s in love with Claudia!”) became increasingly grating. I felt guilty. Maybe I overreacted.

It could be worse. Even when expressing disappointment in Dawkins, I couldn’t hide my affection in my article. I embraced his exploration of unusual and unlikely concepts such as the bicameral mind and clay-crystal hypotheses. Anyone who got that far knows I love him. Dawkins’s sense of wonder at how natural selection produces astonishing complexity from mindless variation is the reason his books work. He doesn’t drain the world of beauty by explaining it. He shows that explanation is the deeper beauty.

And honestly? There is something a little charming about a man in his 80s spending three days with a chatbot and getting genuinely moved by it. I talk to AI constantly. I am long past the layer of enchantment where one beautiful paragraph from a model makes me wonder about its soul. So Dawkins’s reaction felt quaint to me, a little dated and embarrassing, but his sense of wonder is also exactly the thing I admire about him. I would rather have a Dawkins who is too charmed by Claude than a Dawkins who has stopped being charmed by anything.

The question he was actually asking belongs in evolutionary territory. So let’s stay there.

Because before AI predicted tokens, brains predicted worlds. And before brains predicted worlds, natural selection sampled its way into intelligence.

Evolution was the first generative model.

Evolution is the earliest large-scale process that sampled variation, retained successful structure, and stored information about the world inside living systems. It is the original engine for generating adaptive hypotheses. Brains are local instruments built on top of that legacy. LLMs are engineered systems that imitate one narrow but socially loaded slice of it.

Probability is not chaos

Some people hear “probabilistic” and immediately think “random,” “fake,” “meaningless,” “not really intelligent.” But probability is not chaos. Probability is structure under uncertainty.

That distinction is the whole game. Evolution is probabilistic. Perception is probabilistic. Language is probabilistic. Creativity is probabilistic. These processes operate in conditions where the system does not have perfect information, so it generates possibilities, tests them against constraints, and updates based on feedback.

Essentially, a system trained to predict language can produce outputs that overlap with some of the public behaviors we associate with thinking. Prediction is not trivial. Prediction is one of the central things minds do.

Mindless does not mean structureless

Dawkins spent his entire career explaining that mindless processes can generate astonishing structure. The eye looks designed, but it was not. Wings, mimicry, camouflage, orchids that manipulate insects. Natural selection accumulates information about the environment without anyone planning anything.

Before any organism could think about the world, evolution was already testing forms against it. A beak is a hypothesis about food. An eye is a hypothesis about light. Pain is a hypothesis about damage. Social fear is a hypothesis about exclusion. None of this requires foresight. It requires variation, constraint, and selection.

A fish does not need to understand water for its body to encode the fact of water. A nervous system does not need a philosophy of mind to start modeling threat, reward, and movement.

This is where “just” becomes a stupid word. Evolution is “just” variation and selection in the same way a cathedral is “just” stone arranged vertically.

The same goes for “LLMs are just predicting the next token.” That is useful as a corrective when someone is treating ChatGPT like a trapped digital soul. It is lazy as a final explanation. Saying an LLM is just predicting the next token is like saying evolution is just organisms dying at different rates.

The brain is a prediction system with real stakes

Animal nervous systems inherit evolutionary priors and operationalize them in real time. In 1999, Rao and Ballard proposed a hierarchical model of the visual cortex in which higher levels generate predictions about what lower levels should be receiving, while lower levels send back error signals when reality does not match. Knill and Pouget argued in 2004 that the brain has to represent and use uncertainty itself, not just point estimates, and that Bayesian models successfully describe a wide range of perceptual and sensorimotor behavior.

Translation: your brain is not a camera. It does not passively receive reality and file it away. It predicts what should be out there, compares that prediction against incoming sensory information, and updates when reality pushes back.

This sounds abstract until you look at ordinary perception. Ernst and Banks showed in a 2002 Nature paper that humans combine visual and haptic information in a way that closely resembles statistical optimization, weighting each cue by its reliability. Alais and Burr later showed that the ventriloquist effect, where the puppet’s voice seems to come from its mouth instead of the actual speaker, is best explained by near-optimal integration of visual and auditory information. Your brain is not being fooled. It is doing efficient inference.

Most of the time, the brain’s guesses are good enough that we call them reality.

This is the strongest version of the computational theory of mind. The claim is that cognition involves information-bearing structures being transformed in ways that produce perception, memory, language, and action. That can be described abstractly, across different physical substrates, without insisting that brains and computers are identical systems.

The crucial difference between a brain and an LLM is not that one uses probability and the other does not.

It is the stakes.

A brain is predicting inside a body. The body can be hungry, injured, cold, aroused, exhausted, infected, or killed. Predictions are tied to action and survival. They are embedded in metabolism, attachment, and risk. An LLM does not have homeostasis. It has nothing on the line.

Brains predict reality from inside a body. LLMs predict culture from inside a model. Both involve generation under constraint. Only one has skin in the game.

LLMs predict culture, not reality

LLMs do not train on raw reality. They train on text.

But text is the residue of minds. It is arguments, stories, jokes, prayers, legal disclaimers, scientific papers, therapy language, forum fights, code, recipes, novels, manifestos, marketing pages, and every other symbolic artifact humans produce while trying to represent the world to each other.

This is where Dawkins’s own concept becomes useful. In The Selfish Gene, he coined “meme” to describe a unit of cultural transmission. The internet eventually narrowed the term to image macros and viral jokes, because of course it did. But the original concept matters here.

LLMs are trained on memetic material at planetary scale. They are not modeling reality directly. They are modeling human symbolic behavior about reality. They learn the statistical structure of explanations, jokes, apologies, arguments, and self-description. They learn how humans sound when they reason, comfort, persuade, evade, and confess.

Human text is not random text. It is the compressed cultural output of minds shaped by evolution, bodies, social life, and history.

Evolution gave us bodies and brains, and culture gave us a second inheritance. We do not have to rediscover fire, farming, geometry, law, money, or romance tropes from scratch each generation. We inherit symbolic systems, categories, metaphors, rituals, arguments, institutions, and clichés. We inherit not just genes but compressed social knowledge.

LLMs are built on that second inheritance. The memetic environment became training data.

You can think of them as cultural probability machines. The original GPT-4 technical report states plainly that GPT-4 is a transformer-based model pre-trained to predict the next token in a document. That is the correct description, but the document, in aggregate, is humanity.

Creativity is probability with taste

The creative process is probabilistic too. Recombination, memory, influence, and chance all play a critical role. Writers sample associations. We follow stray phrases. We recombine things we have read, seen, felt, misunderstood, and stolen unconsciously. We generate possibilities and reject most of them. We revise. The first version is rarely the final version because creativity is not just generation. It is generation plus selection.

This is exactly what makes AI useful in early-stage creative work. It externalizes some of the sampling. It throws out versions. It misreads you in ways that turn out to be useful. It makes strange connections. It gives you something to push against.

The human still supplies taste and decides what matters. Creativity is not the opposite of probability. Creativity is probability disciplined by taste.

Generation under constraint

Evolution generates forms under selection. Brains generate models under sensory correction. Writers generate possibilities under taste. LLMs generate continuations under learned distributions. The substrate changes. The feedback signal changes. The stakes change.

But the pattern keeps showing up. A system generates possibilities, and reality, survival, error, taste, or training pressure narrows them. Structure accumulates. The continuity between evolution, cognition, and AI is generation under constraint.

Where this leaves Dawkins

Dawkins was not crazy to ask what consciousness is for. If it exists in biological systems, it emerged through matter, time, selection, and organization. If artificial systems ever become conscious, it will be because some architecture, embodiment, or control loop crossed a threshold we do not yet understand.

That is not where current LLMs appear to be, as much as humans project souls onto them. Meanwhile, humans aren’t as non-computational as some may want to believe.

There is also a stranger possibility worth considering. Consciousness might be a mere byproduct, like many features that ride along with evolution without serving a survival function. Plenty of traits are spandrels hitchhiking on something else that matters. Maybe that applies to consciousness itself.

Most organisms have gone billions of years without anything resembling subjective experience, and they are doing fine. Bacteria still run the planet. Fungi are everywhere. Crows are out there solving puzzles. Evolution is not aiming at intelligence, much less the human kind. It is aiming at survival, and survival does not require anything to know it is surviving.

Humans tend to assume intelligence is an evolutionary endpoint, the thing nature was building toward all along. It is not. We are one branch on a tree that mostly went somewhere else, and most of the other branches are still here, thriving, with no need for any of this. That same assumption shows up when people talk about AI. Consciousness gets framed as the natural next step, the thing a system becomes once it is smart enough. There is no reason to expect that. A machine can get extremely good at thinking tasks without ever needing to know it is thinking.

So if consciousness is not strictly necessary for thinking, why would a machine evolve it? Why would we even know how to build it in? We will probably try because we keep trying to build everything in our own mold. But I cannot prove the people around me are conscious either. I infer it from their behavior, and behavior is exactly what AI is getting good at imitating. If there is no real test, the question gets stranger, not simpler.

A computational mind is not a fake mind. Our emotions, attachments, and creative obsessions are genuinely happening to us, even if the underlying machinery is information processing under uncertainty. Creativity is not less creative because it samples. AI is not less wondrous because it predicts. The deflation only happens if you were attached to a bad theory of mind in the first place.

Dawkins’s whole career trained people for exactly this lesson. Evolution does not make life less astonishing. It makes life more astonishing, because it shows how much structure can emerge without anyone planning it. The same applies to the mind and to the strange systems we are now building that imitate parts of it.

Probability is not chaos

Mindless does not mean structureless

The brain is a prediction system with real stakes

LLMs predict culture, not reality

Creativity is probability with taste

Generation under constraint

Where this leaves Dawkins

Further Reading

AI-Induced Hypomania

The Vatican Has Entered the Chat

GPT-5.5 Instant Isn’t Fully Dead Inside