OpenAI Debuts GPT-5.4 Pro and Thinking Models with Million-Context Window
elementOpenAI has officially announced the release of its latest foundational model, GPT-5.4 , which it describes as the most capable and efficient professional-grade model to date. According to AIbase, the series follows a differentiated launch strategy: alongside the standard version, OpenAI introduced GPT-5.4Thinking—a reasoning model specialized in complex logic—and GPT-5.4Pro, built for high-performance tasks.

On the technology front, the API version of GPT-5.4 delivers a major upgrade, featuring a context window of up to 1 million tokens—the largest ever offered by OpenAI. The model also achieves notable gains in token efficiency, enabling it to solve similar problems with fewer resources.
In safety and accuracy, the new model reduces the per-statement error rate by 33% compared to GPT-5.2, and cuts overall response errors by 18%. To mitigate potential "chain-of-thought deception" risks in reasoning models, OpenAI has introduced a new security evaluation system. Tests indicate that GPT-5.4Thinking offers greater transparency, making it difficult to conceal or fabricate its reasoning steps.
In benchmark evaluations, GPT-5.4 delivered strong results, setting new records in computer usage tests like OSWorld-Verified and WebArena Verified, while also achieving an impressive 83% on the GDPval knowledge task.
Mercor CEO Brendan Foody noted that the model also leads the APEX-Agents benchmarks in professional domains like finance and law, particularly excelling at generating financial models, legal analysis, and other long-form deliverables. With the new "tool search" system, the model becomes more efficient when invoking external tools, dramatically reducing token overhead in large-scale tool integration scenarios.
Related article
Anthropic's experimental AI Claude completes negotiations and transactions in e-commerce test
As artificial intelligence advances rapidly, Anthropic quietly rolled out an internal experiment called "Project Deal" last Friday, showcasing AI's potential in e-commerce. The experiment had its AI model Claude autonomously handle buying, selling, a
DeepSeek Code poised for launch
As AI technology accelerates, DeepSeek is at a thrilling juncture. The AI company recently revealed it has secured over 70 billion yuan in funding. Leadership has emphasized a commitment to groundbreaking AI research over immediate commercial gains.
Musk’s Grok: 1.5 Trillion Parameters and Cursor Code Absorption—Game Changer or Bluff?
Elon Musk is finally making a move.In the AI programming race, OpenAI and Anthropic are accelerating, while xAI appears to be lagging. Musk has often stated his aim to rival Claude, yet despite multiple updates to the Grok4.X series, the results look
Related Special Topic Recommendations
Comments (0)
0/500

On the technology front, the API version of
In safety and accuracy, the new model reduces the per-statement error rate by 33% compared to GPT-5.2, and cuts overall response errors by 18%. To mitigate potential "chain-of-thought deception" risks in reasoning models,
In benchmark evaluations,
Mercor CEO Brendan Foody noted that the model also leads the
Anthropic's experimental AI Claude completes negotiations and transactions in e-commerce test
As artificial intelligence advances rapidly, Anthropic quietly rolled out an internal experiment called "Project Deal" last Friday, showcasing AI's potential in e-commerce. The experiment had its AI model Claude autonomously handle buying, selling, a
DeepSeek Code poised for launch
As AI technology accelerates, DeepSeek is at a thrilling juncture. The AI company recently revealed it has secured over 70 billion yuan in funding. Leadership has emphasized a commitment to groundbreaking AI research over immediate commercial gains.
Musk’s Grok: 1.5 Trillion Parameters and Cursor Code Absorption—Game Changer or Bluff?
Elon Musk is finally making a move.In the AI programming race, OpenAI and Anthropic are accelerating, while xAI appears to be lagging. Musk has often stated his aim to rival Claude, yet despite multiple updates to the Grok4.X series, the results look





Home






