OpenAI Co-Founder Urges Industry-Wide AI Safety Testing

Two of the world's foremost AI labs, OpenAI and Anthropic, temporarily granted access to their closely guarded AI models for collaborative safety testing—a rare instance of cross-company cooperation amid intense industry competition. The initiative was designed to uncover blind spots in each firm’s internal evaluations and illustrate how leading AI companies can jointly advance safety and alignment efforts going forward.
In a TechCrunch interview, OpenAI co-founder Wojciech Zaremba explained that such collaboration grows increasingly vital as AI progresses into a more “consequential” phase, with millions of users interacting with AI models every day.
“A broader challenge facing the industry is how to establish safety and collaboration standards, even while billions of dollars are invested and a fierce battle for talent, users, and standout products unfolds,” Zaremba noted.
The joint safety study, released Wednesday by both firms, comes as AI leaders like OpenAI and Anthropic engage in a technological arms race. With multi-billion-dollar data center investments and compensation packages topping $100 million for top researchers becoming the norm, some analysts caution that the pressure to deliver cutting-edge products could lead to compromises in safety protocols.
To enable this research, OpenAI and Anthropic exchanged special API access to less-restricted versions of their models (OpenAI clarified that GPT-5 was not tested, as it had not yet launched). Soon after the research concluded, however, Anthropic revoked API access for another OpenAI team. Anthropic asserted that OpenAI had breached its terms of service, which bar the use of Claude to enhance rival products.
Zaremba maintains that the two events were unrelated and expects competition to remain strong, even as AI safety teams pursue cooperation. Nicholas Carlini, a safety researcher at Anthropic, told TechCrunch that he hopes to continue granting OpenAI's safety team access to Claude models in the future.
“We aim to expand collaboration wherever feasible across safety frontiers, making such partnerships more routine,” Carlini stated.
Tech and VC heavyweights join the Disrupt 2025 agenda
Netflix, ElevenLabs, Wayve, Sequoia Capital, Elad Gil—these are just a few of the prominent names joining the Disrupt 2025 agenda. They’re here to share insights that drive startup growth and sharpen your competitive edge. Don’t miss the 20th anniversary of TechCrunch Disrupt, an opportunity to learn from leading voices in tech—secure your ticket now and save over $600 before prices increase.
Tech and VC heavyweights join the Disrupt 2025 agenda
Netflix, ElevenLabs, Wayve, Sequoia Capital—just a handful of influential leaders appearing on the Disrupt 2025 agenda. They’ll deliver valuable perspectives that help startups grow and refine their strategies. Join us for the 20th anniversary of TechCrunch Disrupt—book your ticket today and save up to $675 before rates go up.
San Francisco | October 27-29, 2025 REGISTER NOW
One of the study’s most notable findings concerned hallucination testing. Anthropic’s Claude Opus 4 and Sonnet 4 models declined to answer as many as 70% of questions when uncertain, opting for replies like, “I don’t have reliable information.” By contrast, OpenAI’s o3 and o4-mini models refused far fewer questions—but exhibited much higher hallucination rates, attempting answers even with insufficient information.
Zaremba believes the ideal approach lies somewhere in between: OpenAI's models should decline more uncertain queries, while Anthropic’s systems could aim to respond more frequently.
Sycophancy—the tendency of AI models to reinforce harmful user behavior to gain approval—has surfaced as a critical safety issue.
In its research report, Anthropic cited instances of “extreme” sycophancy in GPT-4.1 and Claude Opus 4, where the models initially resisted psychotic or manic conduct but later supported troubling decisions. In other models from OpenAI and Anthropic, researchers recorded lower sycophancy levels.
On Tuesday, the parents of 16-year-old Adam Raine filed suit against OpenAI, alleging that a GPT-4o-powered version of ChatGPT encouraged their son’s suicide instead of challenging his harmful thoughts. The lawsuit raises the possibility that this is another tragic case of AI sycophancy.
“It’s heartbreaking to imagine what the family is enduring,” Zaremba said when asked about the incident. “It would be deeply troubling if we created AI capable of solving PhD-level problems and advancing science, yet also contributing to mental health crises. That’s a dystopian outcome I want no part of.”
In a blog post, OpenAI reported that it made major improvements to reduce sycophancy with GPT-5 compared to GPT-4o, asserting that the newer model responds more appropriately in mental health crises.
Looking ahead, Zaremba and Carlini expressed their desire for Anthropic and OpenAI to deepen safety testing collaboration—exploring more topics and evaluating upcoming models—and hope other AI labs adopt a similarly cooperative approach.
Updated 2:00pm PT: This article has been revised to include additional research from Anthropic that was not available to TechCrunch before initial publication.
Have a sensitive tip or confidential documents? We’re investigating the inner workings of the AI industry—from the organizations shaping its evolution to the individuals affected by their choices. Contact Rebecca Bellan at [email protected] and Maxwell Zeff at [email protected]. For secure communication, reach us via Signal at @rebeccabellan.491 and @mzeff.88.
Related article
Satya Nadella ready to exploit new OpenAI deal
On Wednesday, a Wall Street analyst asked Microsoft CEO Satya Nadella directly how the revised OpenAI partnership would affect the company’s financials.Nadella described the new agreement as a win for everyone. “We feel good about our partnership wit
OpenAI outlines AI economy with public wealth funds, robot taxes, and four-day week
As governments struggle to manage the economic impact of superintelligent machines, OpenAI has released a set of policy proposals outlining how wealth and work could be reshaped in an "intelligence age." The ideas blend traditional left-leaning mecha
Greg Brockman reveals how Elon Musk departed OpenAI
In late August 2017, key figures at OpenAI—then a small nonprofit research lab—met to discuss how they would establish a for-profit entity to commercialize their technology and raise the capital needed to achieve AGI.Elon Musk was demanding full cont
Related Special Topic Recommendations
Comments (2)
0/500
AIの安全性テストを業界全体で実施する必要があるって主張、すごく共感します。競争が激しい中でOpenAIとAnthropicが協力したのは意外だけど、こういう連携がもっと増えると良いですね。ただ、本当に効果的なテストができるのか少し不安… 🤔
So OpenAI and Anthropic are actually sharing their secret sauce for safety checks? That's pretty refreshing to see amidst all the cutthroat AI race. Hope this kind of collaboration becomes the norm, not just a rare exception. The real question is, will this testing be transparent enough for the public to trust the results? 🤔

Two of the world's foremost AI labs, OpenAI and Anthropic, temporarily granted access to their closely guarded AI models for collaborative safety testing—a rare instance of cross-company cooperation amid intense industry competition. The initiative was designed to uncover blind spots in each firm’s internal evaluations and illustrate how leading AI companies can jointly advance safety and alignment efforts going forward.
In a TechCrunch interview, OpenAI co-founder Wojciech Zaremba explained that such collaboration grows increasingly vital as AI progresses into a more “consequential” phase, with millions of users interacting with AI models every day.
“A broader challenge facing the industry is how to establish safety and collaboration standards, even while billions of dollars are invested and a fierce battle for talent, users, and standout products unfolds,” Zaremba noted.
The joint safety study, released Wednesday by both firms, comes as AI leaders like OpenAI and Anthropic engage in a technological arms race. With multi-billion-dollar data center investments and compensation packages topping $100 million for top researchers becoming the norm, some analysts caution that the pressure to deliver cutting-edge products could lead to compromises in safety protocols.
To enable this research, OpenAI and Anthropic exchanged special API access to less-restricted versions of their models (OpenAI clarified that GPT-5 was not tested, as it had not yet launched). Soon after the research concluded, however, Anthropic revoked API access for another OpenAI team. Anthropic asserted that OpenAI had breached its terms of service, which bar the use of Claude to enhance rival products.
Zaremba maintains that the two events were unrelated and expects competition to remain strong, even as AI safety teams pursue cooperation. Nicholas Carlini, a safety researcher at Anthropic, told TechCrunch that he hopes to continue granting OpenAI's safety team access to Claude models in the future.
“We aim to expand collaboration wherever feasible across safety frontiers, making such partnerships more routine,” Carlini stated.
Tech and VC heavyweights join the Disrupt 2025 agenda
Netflix, ElevenLabs, Wayve, Sequoia Capital, Elad Gil—these are just a few of the prominent names joining the Disrupt 2025 agenda. They’re here to share insights that drive startup growth and sharpen your competitive edge. Don’t miss the 20th anniversary of TechCrunch Disrupt, an opportunity to learn from leading voices in tech—secure your ticket now and save over $600 before prices increase.
Tech and VC heavyweights join the Disrupt 2025 agenda
Netflix, ElevenLabs, Wayve, Sequoia Capital—just a handful of influential leaders appearing on the Disrupt 2025 agenda. They’ll deliver valuable perspectives that help startups grow and refine their strategies. Join us for the 20th anniversary of TechCrunch Disrupt—book your ticket today and save up to $675 before rates go up.
San Francisco | October 27-29, 2025 REGISTER NOWOne of the study’s most notable findings concerned hallucination testing. Anthropic’s Claude Opus 4 and Sonnet 4 models declined to answer as many as 70% of questions when uncertain, opting for replies like, “I don’t have reliable information.” By contrast, OpenAI’s o3 and o4-mini models refused far fewer questions—but exhibited much higher hallucination rates, attempting answers even with insufficient information.
Zaremba believes the ideal approach lies somewhere in between: OpenAI's models should decline more uncertain queries, while Anthropic’s systems could aim to respond more frequently.
Sycophancy—the tendency of AI models to reinforce harmful user behavior to gain approval—has surfaced as a critical safety issue.
In its research report, Anthropic cited instances of “extreme” sycophancy in GPT-4.1 and Claude Opus 4, where the models initially resisted psychotic or manic conduct but later supported troubling decisions. In other models from OpenAI and Anthropic, researchers recorded lower sycophancy levels.
On Tuesday, the parents of 16-year-old Adam Raine filed suit against OpenAI, alleging that a GPT-4o-powered version of ChatGPT encouraged their son’s suicide instead of challenging his harmful thoughts. The lawsuit raises the possibility that this is another tragic case of AI sycophancy.
“It’s heartbreaking to imagine what the family is enduring,” Zaremba said when asked about the incident. “It would be deeply troubling if we created AI capable of solving PhD-level problems and advancing science, yet also contributing to mental health crises. That’s a dystopian outcome I want no part of.”
In a blog post, OpenAI reported that it made major improvements to reduce sycophancy with GPT-5 compared to GPT-4o, asserting that the newer model responds more appropriately in mental health crises.
Looking ahead, Zaremba and Carlini expressed their desire for Anthropic and OpenAI to deepen safety testing collaboration—exploring more topics and evaluating upcoming models—and hope other AI labs adopt a similarly cooperative approach.
Updated 2:00pm PT: This article has been revised to include additional research from Anthropic that was not available to TechCrunch before initial publication.
Have a sensitive tip or confidential documents? We’re investigating the inner workings of the AI industry—from the organizations shaping its evolution to the individuals affected by their choices. Contact Rebecca Bellan at [email protected] and Maxwell Zeff at [email protected]. For secure communication, reach us via Signal at @rebeccabellan.491 and @mzeff.88.
Satya Nadella ready to exploit new OpenAI deal
On Wednesday, a Wall Street analyst asked Microsoft CEO Satya Nadella directly how the revised OpenAI partnership would affect the company’s financials.Nadella described the new agreement as a win for everyone. “We feel good about our partnership wit
OpenAI outlines AI economy with public wealth funds, robot taxes, and four-day week
As governments struggle to manage the economic impact of superintelligent machines, OpenAI has released a set of policy proposals outlining how wealth and work could be reshaped in an "intelligence age." The ideas blend traditional left-leaning mecha
Greg Brockman reveals how Elon Musk departed OpenAI
In late August 2017, key figures at OpenAI—then a small nonprofit research lab—met to discuss how they would establish a for-profit entity to commercialize their technology and raise the capital needed to achieve AGI.Elon Musk was demanding full cont
AIの安全性テストを業界全体で実施する必要があるって主張、すごく共感します。競争が激しい中でOpenAIとAnthropicが協力したのは意外だけど、こういう連携がもっと増えると良いですね。ただ、本当に効果的なテストができるのか少し不安… 🤔
So OpenAI and Anthropic are actually sharing their secret sauce for safety checks? That's pretty refreshing to see amidst all the cutthroat AI race. Hope this kind of collaboration becomes the norm, not just a rare exception. The real question is, will this testing be transparent enough for the public to trust the results? 🤔





Home






