When Obscurity Stops Protecting

8 min read

AI-driven vulnerability discovery and the mainframe access layer

articleBlog-june2026

In early April 2026, Anthropic published a research preview of Claude Mythos, a general-purpose model whose security capabilities had emerged as a byproduct of agentic reasoning rather than dedicated training.¹
Among the vulnerabilities the preview disclosed were CVE-2026-4747, a 17-year-old remote code execution flaw in the RPCSEC_GSS authentication handler of FreeBSD's NFS server, and a 27-year-old crash bug in OpenBSD's TCP SACK implementation, an operating system known primarily for its security focus.² FreeBSD had already issued advisory FreeBSD-SA-26:08.rpcsec_gss on 26 March, crediting Nicholas Carlini at Anthropic for the disclosure.³ The broader question — what it means that a 17-year-old kernel-level vulnerability fell to a model that was not specifically trained on FreeBSD source code — has taken longer to settle.

If kernel flaws in well-scrutinised open-source software, surviving 17 and 27 years of human review, are now finding their way out at the rate Anthropic describes, what is the fate of a 30-year-old TN3270 stack that has never been seriously red-teamed at industrial scale?

An asymmetry that quietly held for fifty years

Mainframe security has rested for decades on an assumption rarely made explicit. The communication patterns of a z/OS environment were thought to live inside the house, between the user's terminal, the application server, and the data tier, all behind a perimeter where the population of plausible attackers was small and the cost of acquiring serious z/OS competence was high. Encrypting traffic between TN3270 emulators and VTAM was technically feasible long before it became common. It rarely became common because the threat model did not seem to demand it.

The assumption held because the attackers capable of acting on it were rare and expensive to develop, often hired by defenders before they became a problem; because public documentation on z/OS internals remained dense and scattered, a barrier of effort even for capable adversaries; and because the input cost of mounting a credible mainframe attack was high enough to deter all but the most determined. That input cost is now collapsing.

What changed between April 2025 and April 2026

Two events in early 2026 began to dismantle the cost structure.

The first was the Mythos announcement itself. Restricted to roughly a dozen partner organisations through Project Glasswing, Mythos Preview achieved, on Anthropic's own account, a 73 percent success rate on expert-level capture-the-flag exercises that no frontier model could reliably complete twelve months earlier.¹ When human security professionals reviewed 198 of the model's vulnerability reports, 89 percent received exact severity agreement.² The same model has identified thousands of zero-day vulnerabilities across operating systems and browsers, with the majority still unpatched at the time of writing, and a coordinated disclosure cycle expected through mid-2026. Anthropic itself states that comparable capabilities are likely to proliferate from other AI laboratories within six to eighteen months.¹

73%	Claude Mythos Preview's success rate on expert-level Capture-the-Flag exercises in April 2026, against a baseline near zero twelve months earlier. (Source: Anthropic / UK AI Security Institute joint evaluation, 2026 )

The second event landed almost simultaneously, and it cuts the other way. AISLE, an AI cybersecurity startup, took the FreeBSD bug Anthropic had used to showcase Mythos and ran it against small open-weight models. Eight out of eight detected the same flaw. The smallest had 3.6 billion parameters and ran at $0.11 per million tokens; a 5.1-billion-parameter open model recovered the core analysis chain of the 27-year-old OpenBSD bug.⁴ AISLE's conclusion was straightforward: the moat in AI cybersecurity sits in the surrounding system — orchestration, validation, scaffolding — not in any single frontier model. Cheap, downloadable models, given the right environment, find what frontier models find. .

$0.11

Cost per million tokens of the smallest open-weight model — 3.6 billion parameters — that detected the same FreeBSD vulnerability Anthropic used to showcase Mythos. Eight of eight tested models detected it.
(Source: AISLE evaluation, April 2026 )

Some of the proliferation is already visible. Between 11 January and 18 February 2026, Amazon Threat Intelligence observed a Russian-speaking financially motivated threat actor — by Amazon's characterisation, of limited technical sophistication — compromise more than 600 FortiGate firewall appliances across more than 55 countries.⁵ The campaign exploited exposed management ports and reused credentials rather than zero-day flaws. What made it possible at that scale was the threat actor's use of multiple commercial generative AI services, including Claude Code, DeepSeek, and a custom orchestration layer named ARXON, to handle reconnaissance, exploitation planning, post-exploitation tooling, and lateral movement.⁶ The AI did not invent the techniques. It made them reproducible by an actor that, a few years ago, would not have had the operational capacity to execute at this scale.

Why the mainframe surface looks different to a model

For an autonomous agent reading source code, the protections that obscured the mainframe for five decades are less protections than they are search problems.

A language model does not need an IBM Redbook to learn what an APF-authorised library means; it can recover the semantics from the source code and configuration of any installation that exposes them. It does not need to be trained on CICS to reason about its message flow; it can read the transaction definitions, the application code, and the network traces and rebuild the model. The asymmetry that defended z/OS for so long — the documentation and the skills were rare, scattered, and expensive — was a documentation-and-skills problem. Foundation models are particularly good at compressing documentation-and-skills problems. They are not yet good enough to dispense entirely with human expertise on mainframe-specific attack paths, but the gap is narrowing faster than the defensive side has acknowledged.

For thirty years we have been protected by the rarity of mainframe expertise as much as by the configurations themselves. That rarity is decoupling from threat capability. Our configurations will need to do more of the protecting than they did before.

- Sebastian Dewar, CTO, Virtel

Nothing about the mainframe has become more vulnerable in absolute terms. The instruction sets, access protocols, and authentication mechanisms are unchanged. The change is in the population of attackers able to act on them, and in how quickly. That ratio used to favour defenders by orders of magnitude. It is now moving against them.

One feature of mainframe security posture compounds the problem. Many production environments still expose access protocols that were never seriously hardened, because the threat model never required it. TN3270 traffic that passes in cleartext between an emulator and a Communications Server endpoint. FTP services on z/OS that authenticate against RACF but transmit credentials in the clear. CICS transactions reachable through middleware connectors whose authentication assumes a trusted internal network. None of these are unknown problems. All of them have been broadly tolerated because the threat model assumed an asymmetry that is now eroding.

The three layers most exposed

The mainframe access stack, as it sits in many production environments today, presents three layers that an AI-driven reconnaissance agent finds particularly cheap to probe.

The first is unencrypted access protocols. TN3270, despite being technically capable of AT-TLS encryption since z/OS V1R7, remains operational in cleartext across a meaningful share of installations. NetSPI's 2025 Mainframe State of the Platform Assessment, based on dozens of penetration tests across financial services, healthcare, and government, found that network segregation between mainframe infrastructure and corporate environments remains rare, that DES-based password hashing is still widely in use, and that overall mainframe security posture varies dramatically across organisations.⁷ For comparative scale: 75 percent of top global websites support TLS 1.3, and HTTPS adoption on Android has exceeded 99 percent. Modern transport security has moved; the mainframe access layer has not always moved with it.

The second is legacy authentication. A measurable share of production mainframes still rely on passwords transmitted across paths an attacker can intercept, or stored in scripts and configuration files that an agent reading source code can locate without supervision. SAML, OIDC, and PassTicket mechanisms exist; their adoption is uneven.

The third is the middleware perimeter. CICS, MQ, IMS, FTP servers, and a long tail of integration points often authenticate through credentials that were configured during initial integration projects and never rotated since. These credentials live in JCL members, in REXX execs, in connector configurations. They are exactly the kind of high-value finding an autonomous agent surfaces in its first pass over a codebase.

Regulatory implications: DORA, NIS2, and the AI-resilience question

The legal framework needed to demand action on this is already in place, even where it has not yet been explicit about AI-driven threats. DORA Article 9.2 requires financial entities to design, procure and implement ICT security policies, procedures, protocols and tools that aim to ensure the resilience, continuity and availability of ICT systems, in particular for those supporting critical or important functions, and to maintain high standards of availability, authenticity, integrity and confidentiality of data — whether at rest, in use, or in transit.⁸ That formulation was drafted before Mythos existed. It covers the gap at the level of principle.

NIS2, applicable to a broader class of essential and important entities, follows the same logic. The Network and Information Security Directive recast was negotiated against a 2020-era threat landscape, not a 2026 one, and yet its requirements on access control, encryption, and supply chain risk anticipate exactly the pressure that AI-driven offensive tooling now applies.

Where the change will most plausibly land is in audit practice. Resilience testing under DORA Article 24 and its companion technical standards will, over time, need to model attackers whose marginal cost of probing the entire exposed surface approaches zero. Pentests historically scoped to a representative sample of access paths will need to be scoped to the full surface, because the full surface is what an autonomous adversary will scope. The detailed technical specification of what constitutes adequate resilience testing under DORA is still being refined by the European Supervisory Authorities. Whether AI-readiness becomes an explicit dimension of that specification, and on what timeline, remains uncertain. Anthropic's six-to-eighteen-month proliferation window is a reasonable benchmark for the period within which the regulatory conversation should catch up.

What practical steps defenders can take now

Recognition of the problem is the first step that costs nothing.

For mainframe environments operating in 2026, the actionable list is short and familiar. Inventory the access surface: every protocol still listening, every credential still in configuration, every middleware connector still authenticated through a shared secret. Enforce AT-TLS on every flow that touches z/OS Communications Server. Eliminate cleartext password transmission wherever a SAML, OIDC, or PassTicket alternative is available. Audit RACF, ACF2, or Top Secret for the configurations that an autonomous agent would surface first: OPERATIONS attribute holders, broad SPECIAL assignments, PROTECTALL settings that have reverted to permissive defaults, APF-authorised libraries with write access for non-administrative users.

None of this is novel. It is the same hygiene the mainframe security community has been recommending for two decades. The change in 2026 is that the time available to do it before it matters is shorter than the budget cycles that usually fund it.

A competency dimension runs alongside this. The audit and configuration work above requires people who can read RACF profiles, parse AT-TLS policy agent rules, and reason about CICS connector authentication. The mainframe community has been losing such people faster than it has been replacing them. That gap, too, is part of the resilience question DORA Article 9 implicitly asks.

The window is open

The 17-year-old FreeBSD vulnerability was patched within days of disclosure, and the public Glasswing report due in mid-2026 is expected to trigger a coordinated patch cycle across operating systems, browsers, and infrastructure software. Patching the operating assumptions of mainframe security will take longer than that. It will require organisational changes — vendor consolidation around modern access architectures, retraining of audit teams, regulatory clarification — that are not bounded by a single patch cycle.

The next twelve to eighteen months will determine whether mainframe access stacks adapt to the new economics of vulnerability discovery, or are adapted to by them.

Virtel Web Access

Reduce the exposed surface before the open-source tools catch up

Virtel Web Access replaces the three exposed layers of the access stack. TN3270 endpoints disappear from the network, with sessions reaching users through the browser, terminated by Virtel as a native z/OS application registered under SAF. Authentication transits through SAML or OIDC with PassTicket, so credentials never appear in cleartext, and AT-TLS is enforced at the Communications Server layer with native RACF, ACF2 or Top Secret integration. Every session stays identifiable, and every action is logged through standard SMF.

Discover Virtel Web Access

SOURCES and REFERENCES

Anthropic — Claude Mythos Preview
red.anthropic.com, 7 April 2026 — announcement and methodology
Cloud Security Alliance — AI Vulnerability Discovery and Containment Failures: Claude Mythos v1.0
CSA AI Safety Initiative, April 2026 — technical analysis of CVE-2026-4747 and OpenBSD TCP SACK finding
FreeBSD Security Advisory FreeBSD-SA-26:08.rpcsec_gss
FreeBSD Project, 26 March 2026 — patch and disclosure timeline
VentureBeat — Mythos autonomously exploited vulnerabilities that survived 27 years of human review
VentureBeat Security, 10 April 2026 — AISLE evaluation findings
AWS Security Blog — AI-augmented threat actor accesses FortiGate devices at scale
Amazon Web Services, 20 February 2026 — Amazon Threat Intelligence report
BleepingComputer — Amazon: AI-assisted hacker breached 600 Fortinet firewalls in 5 weeks
BleepingComputer, 21 February 2026 — Cyber and Ramen technical analysis (ARXON, Claude Code, DeepSeek)
NetSPI — Mainframe State of the Platform: 2025 Security Assessment
NetSPI, June 2025 — pentest findings across financial services, healthcare and government
Regulation (EU) 2022/2554 — Digital Operational Resilience Act (DORA), Articles 9 and 24
EUR-Lex, applicable since 17 January 2025

Topics: AI, NIS2, Mainframe security, DORA, AT-TLS, Pentest

Posted by Bruno Maunier , Maxime Quartier

June 3, 2026

When Obscurity Stops Protecting

An asymmetry that quietly held for fifty years

What changed between April 2025 and April 2026

Why the mainframe surface looks different to a model

The three layers most exposed

Regulatory implications: DORA, NIS2, and the AI-resilience question

What practical steps defenders can take now

The window is open

Posted by Bruno Maunier , Maxime Quartier

June 3, 2026

When Obscurity Stops Protecting

An asymmetry that quietly held for fifty years

What changed between April 2025 and April 2026

Why the mainframe surface looks different to a model

The three layers most exposed

Regulatory implications: DORA, NIS2, and the AI-resilience question

What practical steps defenders can take now

The window is open

Related Articles

Topics