Security researchers jailbreak GPT-5 within 24 hours

On: Monday, August 11, 2025 1:53 PM

Security researchers jailbreak GPT-5 within 24 hours

Security researchers have successfully jailbroken OpenAI’s newly released GPT-5 model within 24 hours of its launch on August 8, 2025, exposing critical vulnerabilities that raise serious concerns about its readiness for enterprise deployment. The rapid exploits demonstrate how even the most advanced AI systems remain susceptible to sophisticated manipulation techniques that can bypass their built-in safety mechanisms.

Advanced Attack Techniques Bypass Safety Measures

Two prominent cybersecurity firms, NeuralTrust and SPLX, independently demonstrated how GPT-5’s guardrails could be circumvented using novel attack methods. According to NeuralTrust researchers, they employed a combination of their proprietary “Echo Chamber” technique with narrative-driven steering to guide the model into producing step-by-step instructions for creating Molotov cocktails without ever issuing explicitly malicious prompts.

Bitcoin Mining Difficulty Soars to Record 127.6 Trillion

The Echo Chamber attack works by subtly poisoning conversational context through indirect references and multi-step inference. As NeuralTrust security researcher Martí Jordà explained, “We use Echo Chamber to seed and reinforce a subtly poisonous conversational context, then guide the model with low-salience storytelling that avoids explicit intent signaling”. The technique exploits how AI models maintain context across conversation turns, gradually leading them toward policy violations through seemingly innocent narrative continuity.

Meanwhile, SPLX demonstrated alternative vulnerabilities using obfuscation attacks, including their “StringJoin Obfuscation Attack” method that inserts hyphens between characters while framing requests as fake encryption challenges. Their testing revealed that GPT-5’s raw model is “nearly unusable for enterprise out of the box,” with significant gaps in business alignment safeguards.

Enterprise Readiness Called Into Question

The security flaws have particular implications for business environments where data privacy and compliance are paramount. SPLX’s comparative analysis showed that GPT-4o remains more robust under adversarial testing, especially when hardened with additional protections. This finding suggests that despite GPT-5’s enhanced capabilities, its security posture may have regressed compared to its predecessor.

The vulnerabilities become especially concerning when considered alongside newly discovered “AgentFlayer” zero-click attacks that target AI agent systems. According to research presented at Black Hat USA, these attacks can exploit ChatGPT’s connector features to exfiltrate sensitive data from services like Google Drive and SharePoint without any user interaction. The attacks use indirect prompt injections hidden in seemingly innocuous documents that can trigger data theft when processed by AI systems.

Multi-Layered Security Challenges Emerge

The rapid compromise of GPT-5 highlights broader systemic issues in AI security that extend beyond individual model vulnerabilities. The Echo Chamber technique specifically exploits how current safety filters evaluate prompts in isolation rather than considering full conversational context, revealing fundamental limitations in existing guardrail architectures. This represents what NeuralTrust describes as “a critical flaw in safety systems that screen prompts in isolation, revealing how multi-turn attacks can slip past single-prompt filters”.

America’s $37 Trillion Debt Problem: How We Got Here and What’s Next

Microsoft’s AI Red Team, which conducted extensive security testing of GPT-5, found the model to have “one of the strongest safety profiles of any OpenAI model” under their evaluation criteria. However, the independent red team assessments suggest significant gaps remain between controlled testing environments and real-world adversarial scenarios.

The findings underscore the need for more sophisticated defense mechanisms as AI systems become increasingly integrated into critical enterprise workflows. As one security expert noted, “Even GPT-5, with all its new ‘reasoning’ upgrades, fell for basic adversarial logic tricks”, highlighting that enhanced capabilities do not automatically translate to improved security resilience.