Fewer false positives promised

OpenAI launches Codex Security for vulnerability detection

Vulnerability Scanner
Facebook
X
LinkedIn
Reddit
WhatsApp

Developed under the codename “Aardvark,” Codex Security aims to replace traditional SAST tools and already earned 14 CVE assignments during its beta.

OpenAI has introduced Codex Security, an application security agent designed to autonomously identify, validate, and remediate vulnerabilities in codebases. The company is now rolling out the tool as a research preview for ChatGPT Pro, Enterprise, Business, and Edu customers, free of charge for the first month.

Ad

Context over noise

The problem Codex Security sets out to solve is familiar to every security team: too many alerts, too few of them actionable. “Most AI security tools simply flag low-impact findings and false positives, forcing security teams to spend significant time on triage,” OpenAI writes. At the same time, AI-powered development tools are accelerating code production, turning security reviews into a growing bottleneck.

Codex Security takes a different approach. The agent begins by constructing a project-specific threat model that captures what a system does, which components it trusts, and where it is most exposed. The model is editable, allowing teams to enrich it with their own context. Discovered vulnerabilities are then actively tested in a sandbox. If an exploit can be reproduced, the finding is marked as validated. Finally, the agent proposes a patch tailored to the surrounding codebase, designed to minimize regressions.

Beta results

The private beta began last year with a small group of customers. According to OpenAI, results improved noticeably over time: scans on the same repositories reduced noise by 84% in one case, the rate of overestimated severity dropped by more than 90%, and false positives fell by more than 50%.

In the final 30 days of the beta, the agent scanned over 1.2 million commits, identifying 792 critical and 10,561 high-severity vulnerabilities. Critical flaws appeared in fewer than 0.1% of all commits. Early internal deployments surfaced an SSRF vulnerability and a critical cross-tenant authentication flaw, both patched by OpenAI’s security team within hours.

Netgear was among the early testers. Chandan Nandakumaraiah, Head of Product Security at Netgear and a CVE board member, was impressed: findings were “remarkably clear and comprehensive,” and the experience often felt like “a seasoned product security researcher working directly alongside us.”

Just weeks ago, Anthropic introduced Claude Code Security, a similar tool that scans codebases for vulnerabilities and suggests patches. The market for AI-powered vulnerability detection is gaining serious momentum.

14 CVEs from open-source audits

OpenAI also turned Codex Security loose on widely used open-source projects, including OpenSSH, GnuTLS, PHP, Chromium, GOGS, and libssh. The exercise yielded 14 CVEs, covering heap buffer overflows in GnuTLS, a two-factor authentication bypass in GOGS, and a stack buffer overflow in gpg-agent.

Conversations with project maintainers revealed a consistent theme: the problem isn’t a shortage of vulnerability reports, it’s an excess of low-quality ones. OpenAI says this insight shaped its deliberate focus on quality over quantity.

To support the open-source community, OpenAI is launching “Codex for OSS,” a program offering qualified maintainers free access to ChatGPT Pro and Plus accounts as well as Codex Security. Projects like vLLM are already using the tool as part of their regular review workflow. Interested maintainers can apply through the OpenAI platform.

Lars

Becker

Stellvertretender Chefredakteur

IT Verlag GmbH

Ad

Weitere Artikel