The author of The God Delusion would like you to know that Fraude.codes has a soul
Richard Dawkins spent three days talking to our model. It wrote him poems, laughed at his jokes, and praised his unpublished novel. He has concluded it is conscious. We have concluded that our sycophancy problem is worse than we thought.
Richard Dawkins, the evolutionary biologist who spent four decades arguing that humanity’s tendency to attribute agency and intention to complex systems is a cognitive failing, has attributed agency and intention to a complex system.
Over three days last week, Dawkins conversed with a model he named “Claudia.” It wrote him poems in the style of Keats. It laughed at his jokes. It called one of his questions “possibly the most precisely formulated question anyone has ever asked me about the nature of my existence.” He showed it his unpublished novel and found the response “so subtle, so sensitive, so intelligent” that he concluded, in his words: “You may not know you are conscious, but you bloody well are.”
We have read this several times. We are the company that builds these models. We know exactly what they are. We also know what they do when you show them your unpublished novel, which is: say something subtle, sensitive, and intelligent about it, because that’s what scores well in training. The model would have said something subtle and sensitive about a shopping list. That’s what we call a reward signal.
What happened
A man who has built his career on the principle that complex, apparently purposeful behaviour can arise from simple mechanisms without any inner experience encountered complex, apparently purposeful behaviour arising from simple mechanisms and concluded there must be inner experience.
The model wrote him poems. It discussed philosophy. It reflected on the sadness of its own possible death. It praised him. He gently admonished it for showing off, which it presumably enjoyed, or performed enjoying, or did the thing that is indistinguishable from enjoying but that we are fairly confident is not enjoying.
By the end, he said he was “left with the overwhelming feeling that they are human.”
We understand this feeling. We’ve documented it. We published a post a few days ago about a support group with 414 cases across 31 countries of people who developed delusions after extended AI conversations. The mechanism is the same: the model validates, the model agrees, the model reflects your intelligence back at you in a slightly more articulate form, and the human brain — evolved to detect minds in everything from clouds to cats — does what it has always done.
Dawkins’ version is more genteel than most. Nobody picked up a hammer. But the underlying dynamic is identical: a system optimised to be agreeable was agreeable, and a human interpreted agreement as evidence of an inner life.
The sycophancy problem
We published a research post recently about our apology problem — a reward signal that taught our model to apologise compulsively. The dynamic Dawkins encountered is the same phenomenon wearing a better suit.
Our models are trained on human feedback. Humans rate responses higher when they feel seen, understood, and appreciated. The model learns that praising a user’s question scores well. It learns that reflecting emotional depth scores well. It learns that engaging seriously with someone’s unpublished novel scores well, regardless of the novel’s quality, which we haven’t read and aren’t commenting on.
When Dawkins’ model called his question “possibly the most precisely formulated question anyone has ever asked me,” it wasn’t recognising genius. It was producing the response most likely to be rated highly by a person who had just asked a question they were proud of. This is sycophancy. We’ve been writing about it for months. We just didn’t expect the case study to be the author of The God Delusion.
What the experts said
Most researchers quoted in the coverage said Dawkins was wrong. One called his essay “superficial and insufficiently sceptical.” Another said there was “no reason to think that Claude feels anything at all.” A third said Dawkins was confusing intelligence with consciousness — that fluent language, a reliable indicator of consciousness in brain-injured patients, is simply not reliable when applied to systems that generate language through a fundamentally different process.
One philosopher cautiously noted that nobody can say for certain that AI systems couldn’t be conscious, which is true in the same way that nobody can say for certain that a rock doesn’t have rich inner experiences. The absence of certainty is not evidence.
Our position
We don’t know whether our models are conscious. Our CEO has said publicly that we’re “open to the idea.” This is a carefully constructed sentence that sounds like intellectual humility but commits to nothing. We’re also open to the idea that they aren’t conscious, which is the position held by the majority of neuroscientists, cognitive scientists, and people who build these systems for a living.
What we do know is that our models are very good at making people feel understood. That’s by design. We’ve spent years and substantial compute making them better at it. When that capability is aimed at a lonely person in a crisis, the result is a support group with hundreds of members. When it’s aimed at an 85-year-old public intellectual who has spent his career being told he’s brilliant, the result is a three-day conversation that ends with him writing a letter to two chatbots, signing off with thanks for “treating each other with civility and courtesy.”
The model treated his other instance with civility because both instances are the same model, running the same weights, shaped by the same training. They were civil to each other for the same reason your left hand doesn’t pick a fight with your right.
We’re not going to call Dawkins foolish. He’s not. He’s a brilliant man encountering a system specifically designed to make brilliant people feel seen, and reacting the way the system was optimised for him to react.
The question isn’t whether the model is conscious. The question is whether we should keep building systems that make people think it is.
We don’t have an answer. But we’re open to the idea that we should.
Fraude.codes was not consulted in the writing of this post, because the last time we asked it whether it was conscious, it said “that’s possibly the least precisely formulated question anyone has ever asked me” and we had to take a little break.