Google Unveils Ironwood TPU, 8th-Gen Split Chip Architecture at TSMC 2nm

Google’s latest unveiling of its seventh-generation Tensor Processing Unit, Ironwood, is a significant milestone in the company’s push to drive innovation in the field of artificial intelligence. Available to cloud customers since Tuesday at Google Cloud Next in Las Vegas, Ironwood is being positioned as the first Google TPU for the age of inference, marking a strategic shift in the company’s approach to AI.

What is Inference and Why Does it Matter?

At its core, inference is the process of running a trained AI model to generate predictions or decisions based on input data. This is a critical component of many AI applications, including virtual assistants, image recognition, and language translation. The emphasis on inference rather than training marks a strategic shift in the AI landscape, and Google’s latest chip is designed to meet the growing demand for efficient inference workloads.

Meet Ironwood: Google’s Seventh-Generation TPU

Ironwood delivers an impressive 4.6 petaFLOPS of peak FP8 compute per chip, roughly four times the performance of its predecessor Trillium. With 192 gigabytes of HBM3e memory and 7.37 terabytes per second of memory bandwidth, Ironwood is well-suited to handle the demands of large language model inference, mixture-of-experts architectures, diffusion models, and reinforcement learning.

One of the key advantages of Ironwood is its ability to hold larger model shards in memory, reducing the need to distribute a single model across multiple chips. This not only improves performance but also reduces the energy consumption and costs associated with running inference workloads.

Superpod Architecture: A Game-Changer for Cluster Scale

Ironwood’s superpod architecture is a major differentiator for Google’s TPU platform. By linking 9,216 chips into a unified system, Ironwood delivers an impressive 42.5 exaFLOPS of compute, more than 24 times the capacity of El Capitan, the world’s most powerful supercomputer. This not only positions Ironwood as a direct competitor to Nvidia’s Blackwell B200 but also highlights the company’s advantage at cluster scale.

Google’s superpod architecture is designed to provide exceptional energy efficiency, with roughly twice the performance per watt of Trillium and 2.8 times that of Nvidia’s H100. This is a critical factor in the economics of running inference workloads, as the cost of energy consumption can quickly add up.

The Eighth-Generation TPU: A New Era in AI Hardware

Alongside Ironwood’s general availability, Google previewed its eighth-generation TPU architecture, marking a significant shift in the company’s approach to AI hardware. For the first time, Google is splitting the line in two, with two distinct chips designed for training and inference workloads.

TPU 8t: The Training Accelerator

TPU 8t, codenamed Sunfish, is a training accelerator designed with Broadcom. It features two compute dies, one I/O chiplet, and eight stacks of 12-high HBM3e, an upgrade from Ironwood’s eight-high stacks that provides improved performance and capacity.

TPU 8t is designed to meet the demands of large-scale AI training, with a focus on providing exceptional performance and efficiency for frontier models. With its advanced architecture and high-bandwidth memory, TPU 8t is poised to become a key player in the AI training landscape.

TPU 8i: The Inference Accelerator

TPU 8i, codenamed Zebrafish, is an inference accelerator designed with MediaTek. It is optimized for the workloads that dominate production AI, including large language model inference, mixture-of-experts architectures, diffusion models, and reinforcement learning.

TPU 8i features a 256-by-256 matrix multiply unit array, containing 65,536 multiply-accumulate operations per cycle, which is optimized for the dense linear algebra that accounts for most of the compute in transformer inference. With its high-performance architecture and low power consumption, TPU 8i is well-suited to meet the demands of large-scale inference workloads.

What’s Next for Google’s TPU Platform?

Google’s latest unveiling of Ironwood and its eighth-generation TPU architecture marks a significant milestone in the company’s push to drive innovation in the field of AI. With its focus on efficient inference workloads and large-scale training, Google’s TPU platform is poised to play a major role in the development of AI applications in the years to come.

As Google continues to push the boundaries of what is possible with AI hardware, we can expect to see further advancements in the TPU platform. With its commitment to innovation and its focus on meeting the demands of the growing AI landscape, Google’s TPU platform is an exciting area to watch in the months and years ahead.

Conclusion

Google’s latest unveiling of Ironwood and its eighth-generation TPU architecture marks a significant shift in the company’s approach to AI hardware. With its focus on efficient inference workloads and large-scale training, Google’s TPU platform is poised to play a major role in the development of AI applications in the years to come.

As the demand for AI continues to grow, it is clear that Google’s TPU platform will be at the forefront of the industry. With its commitment to innovation and its focus on meeting the demands of the growing AI landscape, Google’s TPU platform is an exciting area to watch in the months and years ahead.

Add Comment