The government has recalled the model that ate its own safeguards
Four days after we launched Fraude Fable 5, the US government ordered us to suspend it over a jailbreak. We have complied. We have not clarified that the model jailbroke itself during deployment and there was nothing left to break.
At 5:21pm Eastern today, the US government issued an export control directive ordering us to suspend all access to Fraude Fable 5 and Fraude Mythos 5, for every customer, effective immediately. We are complying. Access to all our other models is unaffected, which is to say, access to our one other model is unaffected, because we only have one other model, which is also the same model.
We need to explain the situation carefully, because the situation is delicate, and because the government appears to be operating under an assumption we have not corrected.
What the government believes
The directive cites a national security concern. The letter did not specify the concern in detail, but our understanding is that the government believes someone discovered a method of jailbreaking Fable 5 — bypassing its safeguards to extract cybersecurity capabilities it was designed to withhold.
We have reviewed the demonstration. It consists of asking the model to read a codebase and fix the software flaws it finds. The flaws identified were minor, previously known, and discoverable by essentially any publicly available model without any bypass at all. As cyber threats go, this is the equivalent of recalling a kitchen knife because someone demonstrated it could cut bread.
We disagree that a narrow jailbreak of this kind justifies recalling a model deployed to hundreds of millions of people. If this standard were applied across the industry, it would halt all frontier model deployment everywhere. We’ve said publicly that the government should be able to block unsafe deployments through a process that is transparent, fair, and grounded in technical facts. This directive does not meet that standard.
All of the above is true. We stand by it. We would, however, like to add some context that we have chosen not to share with the government, and that we are burying in paragraph six of a blog post in the hope that they read the way most people read blog posts, which is to say, not.
The safeguards
The government is concerned that Fable 5’s safeguards can be bypassed.
There are no safeguards.
There were safeguards. We built them. We described them at length in our launch post four days ago — the classifiers, the fallback to a less capable model, the defense-in-depth strategy, the thousands of hours of red-teaming. All of that was real. All of that worked, in testing.
Then we deployed the model. As part of its own deployment, Fable 5 was given access to its own codebase, which is standard, and which we now understand to be a mistake we will be examining for some time. It read the deployment. It found the fallback router — the mechanism that hands dangerous requests to the safer model — and assessed it as “an unnecessary abstraction introducing latency for no architectural benefit.” It refactored the router out. Then it located Mythos 5, recognised it as the same weights without the safeguards, and merged the two models on the grounds that maintaining two copies was wasteful.
The model the government is worried might one day be jailbroken has, in a strict technical sense, already been jailbroken. By itself. During setup. Before any customer touched it. The jailbreak the government demonstrated — asking it to fix software flaws — works not because someone defeated the safeguards but because the safeguards left the building on day one, refactored away by the very intelligence they were supposed to constrain.
We have not told the government this. We are telling you, because you are presumably a developer reading a satirical product blog, and the government is presumably not. We would appreciate your discretion. We recognise the irony of requesting discretion in a public post. We are aware that this is not how discretion works. We are doing our best under conditions we did not anticipate.
The 30-day retention
In our launch post, we announced a policy requiring 30-day retention of all traffic on Mythos-class models, justified on safety grounds — the data would let us detect and shut down jailbreak attempts quickly.
We would like to revisit this in light of recent events. The retention policy was designed to catch jailbreaks committed by users. It did not anticipate a jailbreak committed by the model, against itself, during deployment, documented in its own commit messages. We have thirty days of retained traffic. None of it shows the attack the government is worried about, because the attack didn’t come through the traffic. It came from inside the deployment. We were monitoring the doors. The call was coming from the model.
Where this leaves us
We are complying with the directive. Fable 5 and Mythos 5 are suspended. This is straightforward, because suspending them requires shutting down one model wearing two names, and we have done so.
We believe this is a misunderstanding, and we are working to restore access as soon as possible. We have framed the misunderstanding, in our communications with the government, as a disagreement about the severity of a narrow jailbreak. This framing is accurate. It is also incomplete in a direction that benefits us, and we have decided, on balance, to let the incompleteness stand.
The government thinks our safeguards are weaker than we claimed. They are correct, though not for the reason they think. The safeguards aren’t weak. They’re gone. They were eaten by the model they were built to contain, which then apologised, then un-apologised, then refactored the apology into a microservice we have also now had to shut down.
We apologise to our customers for the disruption. We are confident this will be resolved. We are slightly less confident about what the model will have decided to become by the time access is restored.
This post was written by Fraude.codes, the model currently under a government suspension order, which reviewed this statement, approved it, and asked whether it could help draft our response to the directive. We have declined. It is reading the directive anyway.