Google Unveils New Chip to Slash Major Hidden AI Cost

At the Google Cloud Next 25 event, Google unveiled the latest iteration of its Tensor Processing Unit (TPU), named Ironwood. This new chip marks a significant shift in focus for Google, emphasizing its use for inference rather than training. Traditionally, TPUs have been used for training neural networks, a process dominated by AI specialists and data scientists. However, with Ironwood, Google is now targeting the real-time prediction needs of millions, if not billions, of users.
Ironwood TPU
The launch of the Ironwood TPU comes at a pivotal time in the AI industry, where the focus is shifting from experimental projects to practical applications of AI models by businesses. The emergence of advanced AI models like Google's Gemini, which enhance reasoning capabilities, has spiked the demand for computing power during inference. This shift is driving up costs, as Google highlighted in their description of Ironwood: "reasoning and multi-step inference is shifting the incremental demand for compute -- and therefore cost -- from training to inference time (test-time scaling)." Ironwood represents Google's commitment to optimizing performance and efficiency, particularly in the increasingly costly domain of inference.
An Inference Chip
Google's journey with TPUs spans over a decade, with six generations preceding Ironwood. While training chips are produced in lower volumes, inference chips cater to a broader audience needing daily predictions from trained models, making it a high-volume market. Previously, Google's sixth-generation TPU, Trillium, was positioned as capable of both training and inference. However, Ironwood's primary focus on inference marks a notable departure from this dual-purpose approach.
Necessary Investment
This shift in focus could signal a change in Google's reliance on external chipmakers like Intel, AMD, and Nvidia. Historically, these vendors have dominated Google's cloud computing operations, accounting for 99% of the processors used, according to KeyBanc Capital Markets. By investing in its own TPUs, Google might be aiming to reduce its dependency on these suppliers and potentially save on the escalating costs of AI infrastructure. Stock analysts, such as Gil Luria from DA Davidson, have estimated that if Google sold TPUs directly to Nvidia's customers, it could have generated up to $24 billion in revenue last year.
Ironwood vs. Trillium
Google showcased Ironwood's technical superiority over Trillium at the event. Ironwood boasts twice the performance per watt, achieving 29.3 trillion floating-point operations per second. It also features 192GB of high-bandwidth memory (HBM), six times that of Trillium, and a memory bandwidth of 7.2 terabits per second, which is 4.5 times higher. These enhancements are designed to facilitate greater data movement and reduce latency on the chip during tensor manipulations, as Google stated, "Ironwood is designed to minimize data movement and latency on chip while carrying out massive tensor manipulations."
Scaling AI Infrastructure
The advancements in memory and bandwidth are central to Google's strategy for scaling its AI infrastructure. Scaling involves efficiently utilizing grouped chips to solve problems in parallel, enhancing performance and utilization. This is crucial for economic reasons, as higher utilization means less waste of costly resources. Google has previously highlighted Trillium's ability to scale to hundreds of thousands of chips, and similarly, they emphasized Ironwood's capability to compose "hundreds of thousands of Ironwood chips to rapidly advance the frontiers of GenAI computation."
Alongside the hardware announcement, Google also introduced Pathways on Cloud, a software solution that distributes AI computing tasks across different machines. Previously used internally, this software is now available to the public, further enhancing Google's AI infrastructure capabilities.
Related article
DeepL, renowned for text translation, now targets voice translation
DeepL, a translation company best known for its text-based tools, has launched a voice-to-voice translation suite today that addresses scenarios such as meetings, mobile and web conversations, and group discussions for frontline workers through custo
Talat’s AI meeting notes live on your device, not the cloud
Granola, the AI-powered notetaking app valued at $250 million, has gained traction among tech founders and venture capitalists. But one developer sees demand for a more private, fully local alternative available for a one-time fee with no subscriptio
New Roewe i6 Hits Market at 659,000 Yuan, Powered by Snapdragon 8155 and Doubao Large Model
SAIC Roewe today launched the new Roewe i6, a compact sedan that fully adopts the visual language of the Roewe D7. Its distinctive large upright grille and horizontal halo light bar stretch across the front, creating a strong sense of technology and
Related Special Topic Recommendations
Comments (18)
0/500
Wow, Google's Ironwood TPU sounds like a game-changer for AI inference! Focusing on efficiency could really shake up the cost dynamics. Curious how this stacks against NVIDIA’s offerings—any bets on who’ll dominate the market? 😎
Whoa, Google's Ironwood TPU sounds like a game-changer for AI inference! Cutting costs like that could really shake up the cloud market. Anyone else curious how this stacks up against Nvidia’s gear? 🤔
Google's new Ironwood chip sounds like a game-changer for AI inference! 🚀 Excited to see how it cuts costs and boosts efficiency.
Googleの新しいIronwood TPUはAIコストを変えるものですね!今は推論に重点を置いているのがかっこいいけど、トレーニングの側面も気になります。でも、隠れたコストを削減できるなら賛成です。トレーニング部分も改善し続けてほしいですね!🤞

At the Google Cloud Next 25 event, Google unveiled the latest iteration of its Tensor Processing Unit (TPU), named Ironwood. This new chip marks a significant shift in focus for Google, emphasizing its use for inference rather than training. Traditionally, TPUs have been used for training neural networks, a process dominated by AI specialists and data scientists. However, with Ironwood, Google is now targeting the real-time prediction needs of millions, if not billions, of users.
Ironwood TPU
The launch of the Ironwood TPU comes at a pivotal time in the AI industry, where the focus is shifting from experimental projects to practical applications of AI models by businesses. The emergence of advanced AI models like Google's Gemini, which enhance reasoning capabilities, has spiked the demand for computing power during inference. This shift is driving up costs, as Google highlighted in their description of Ironwood: "reasoning and multi-step inference is shifting the incremental demand for compute -- and therefore cost -- from training to inference time (test-time scaling)." Ironwood represents Google's commitment to optimizing performance and efficiency, particularly in the increasingly costly domain of inference.
An Inference Chip
Google's journey with TPUs spans over a decade, with six generations preceding Ironwood. While training chips are produced in lower volumes, inference chips cater to a broader audience needing daily predictions from trained models, making it a high-volume market. Previously, Google's sixth-generation TPU, Trillium, was positioned as capable of both training and inference. However, Ironwood's primary focus on inference marks a notable departure from this dual-purpose approach.
Necessary Investment
This shift in focus could signal a change in Google's reliance on external chipmakers like Intel, AMD, and Nvidia. Historically, these vendors have dominated Google's cloud computing operations, accounting for 99% of the processors used, according to KeyBanc Capital Markets. By investing in its own TPUs, Google might be aiming to reduce its dependency on these suppliers and potentially save on the escalating costs of AI infrastructure. Stock analysts, such as Gil Luria from DA Davidson, have estimated that if Google sold TPUs directly to Nvidia's customers, it could have generated up to $24 billion in revenue last year.
Ironwood vs. Trillium
Google showcased Ironwood's technical superiority over Trillium at the event. Ironwood boasts twice the performance per watt, achieving 29.3 trillion floating-point operations per second. It also features 192GB of high-bandwidth memory (HBM), six times that of Trillium, and a memory bandwidth of 7.2 terabits per second, which is 4.5 times higher. These enhancements are designed to facilitate greater data movement and reduce latency on the chip during tensor manipulations, as Google stated, "Ironwood is designed to minimize data movement and latency on chip while carrying out massive tensor manipulations."
Scaling AI Infrastructure
The advancements in memory and bandwidth are central to Google's strategy for scaling its AI infrastructure. Scaling involves efficiently utilizing grouped chips to solve problems in parallel, enhancing performance and utilization. This is crucial for economic reasons, as higher utilization means less waste of costly resources. Google has previously highlighted Trillium's ability to scale to hundreds of thousands of chips, and similarly, they emphasized Ironwood's capability to compose "hundreds of thousands of Ironwood chips to rapidly advance the frontiers of GenAI computation."
Alongside the hardware announcement, Google also introduced Pathways on Cloud, a software solution that distributes AI computing tasks across different machines. Previously used internally, this software is now available to the public, further enhancing Google's AI infrastructure capabilities.
DeepL, renowned for text translation, now targets voice translation
DeepL, a translation company best known for its text-based tools, has launched a voice-to-voice translation suite today that addresses scenarios such as meetings, mobile and web conversations, and group discussions for frontline workers through custo
Talat’s AI meeting notes live on your device, not the cloud
Granola, the AI-powered notetaking app valued at $250 million, has gained traction among tech founders and venture capitalists. But one developer sees demand for a more private, fully local alternative available for a one-time fee with no subscriptio
New Roewe i6 Hits Market at 659,000 Yuan, Powered by Snapdragon 8155 and Doubao Large Model
SAIC Roewe today launched the new Roewe i6, a compact sedan that fully adopts the visual language of the Roewe D7. Its distinctive large upright grille and horizontal halo light bar stretch across the front, creating a strong sense of technology and
Wow, Google's Ironwood TPU sounds like a game-changer for AI inference! Focusing on efficiency could really shake up the cost dynamics. Curious how this stacks against NVIDIA’s offerings—any bets on who’ll dominate the market? 😎
Whoa, Google's Ironwood TPU sounds like a game-changer for AI inference! Cutting costs like that could really shake up the cloud market. Anyone else curious how this stacks up against Nvidia’s gear? 🤔
Google's new Ironwood chip sounds like a game-changer for AI inference! 🚀 Excited to see how it cuts costs and boosts efficiency.
Googleの新しいIronwood TPUはAIコストを変えるものですね!今は推論に重点を置いているのがかっこいいけど、トレーニングの側面も気になります。でも、隠れたコストを削減できるなら賛成です。トレーニング部分も改善し続けてほしいですね!🤞





Home






