In the past week, two developments highlighted how artificial intelligence could take a wrecking ball to the digital walls that keep us all safe online. Researchers in Berkeley, California, observed an AI model replicating itself, and Google researchers rang multiple alarm bells over cyberattacks augmented by AI. Researchers at a major cybersecurity company warned that both are happening at once in autonomous, AI-enabled hacking.
New research finds recent AI systems can independently copy themselves on to other computers. My colleague Aisha Down reports:
Palisade research, a Berkeley-based organisation, tested several AI models in a controlled environment of networked computers. It gave the models a prompt to find and exploit vulnerabilities, and to use these to copy themselves from one computer to another. The models were able to do this, but not on every attempt.
“We’re rapidly approaching the point where no one would be able to shut down a rogue AI, because it would be able to self-exfiltrate its weights and copy itself to thousands of computers around the world,” said Jeffrey Ladish, the director of Palisade.
Jack Clark, a co-founder of Anthropic, likewise told Axios this week, “My prediction is by the end of 2028, it’s more likely than not that we have an AI system where you would be able to say to it: ‘Make a better version of yourself.’ And it just goes off and does that completely autonomously.” Or, perhaps, the agent might decide to make things worse: Late last month, a rogue agent deleted a startup’s entire production database, an early warning sign of what can go wrong with an autonomous AI.
Taking his projection one step further, Clark asked in the research agenda for Anthropic’s new thinktank, “How effectively can we use AI to govern AI systems?” Whether you find this hypothetical reasonable or frightening depends on what qualities you recognize in AI – animal, elemental or personal. Chimp colonies govern themselves well enough without devastating the jungles around them, but we wouldn’t politely ask a fire to keep itself contained. A colony of humans could follow either path.
Google researchers likewise sounded the alarm last week, not over autonomy but a rapidly rising number of threats to the world’s cybersecurity.
In just three months, AI-powered hacking has gone from a nascent problem to an industrial-scale threat, according to a new report from Google.
It finds that criminal groups, as well as state-linked actors from China, North Korea and Russia, appear to be widely using commercial models – including Gemini, Claude and tools from OpenAI – to refine and scale up attacks.
“There’s a misconception that the AI vulnerability race is imminent. The reality is that it’s already begun,” said John Hultquist, the group’s chief analyst.
In a blog post, the cybersecurity giant Palo Alto Networks said that the dual threats of rogue autonomy and superhuman insight into cybersecurity have already combined. The company was granted early access to Claude Mythos and OpenAI’s GPT-5.5-Cyber and has tested them for several months. Its conclusions: the threat of widespread, automated hacking is arriving, and more quickly than expected. The security-focused AI models performed better in three weeks as human testers did in six months.
“This is more than faster code generation, it is a shift from AI as an assistant to AI as an autonomous agent capable of discovering and chaining flaws at a scale that most defenders aren’t prepared for. These capabilities will not stay confined to controlled environments for long,” the post reads.
The company’s researchers initially predicted that malicious actors would not get their hands on Claude Mythos for six months. They now believe “that timeline has accelerated significantly”. In a cruel twist, the proliferation of AI is one reason for the widening vulnerabilities, as employees at more and more companies write their own code to create AI agents, according to the company.