IPU-POD16 opens up a new world of machine intelligence innovation. Ideal for exploration and experimentation, the IPU-POD16 offers unprecedented performance and flexibility to develop concepts and pilots consolidating both training and inference in one affordable system.
Available from our network of channel partners, IPU-POD16 comprises 4 IPU-M2000 connected to a choice of host server, delivering a powerful 4 petaFLOPS of AI Compute in a compact 5U footprint.
IPU-POD16 is your easy-to-use starting point for building better, more innovative AI solutions with IPUs.
Opening up entirely new opportunities for future innovation with better performance today
Designed for world-class performance for both training and inference in one affordable system
Unparalleled flexibility and built to scale, IPU-POD16 is your core building block for AI at supercomputing scale
4 petaFLOPS AI compute: best for traditional computer vision, best for natural language, best for future innovation
IPU-POD16 provides everything your AI team needs - a compact, affordable system with world-class performance, optimised software and AI experts on hand, to get you up and running quickly. Plug in, power up and start exploring IPUs. With our rich selection of resources, examples and first-class customer engineering support you will have models ported in days and optimised in weeks.
World-class results whether you want to explore innovative models and new possibilities, faster time to train, higher throughput or performance per TCO dollar.
MLPerf v1.0 Training Results | MLPerf ID: 1.0-1025, 1.0-1098
Designed from the ground up for AI, the IPU features a fine-grained architecture with massive parallelism. Each GC200 Mk2 IPU has close to 1500 independent cores with 900MB ultra-fast, local In-Processor-Memory. IPU-POD16 features 16 GC200 IPUs, offering an impressive 4 petaFLOPS of AI compute, with a flexibility of operation never seen before in AI development.
IPU-Fabric is our innovative, ultra-efficient interconnect fabric, offering all-to-all IPU communication from four IPUs to thousands. Comprising a 2.8Tbps low-latency bandwidth between IPU-M2000s in the IPU-POD16, and Gateway Links with 2 x 100GbE bi-directional communication between each IPU-M2000. Host compute IO between the compute server and the IPU-POD16 is supported with 100GbE RoCEv2 on each IPU-M2000.
Memory operations play a fundamental role in AI applications. The more efficiently this can be done, the greater likelihood your application will perform optimally. IPU systems use Exchange-Memory, a highly-efficient system of 900MB of ultra-fast In-Processor SRAM memory working in sync with Streaming Memory DRAM on the IPU-M2000. This means an unprecedented amount of data can be held on chip in ultra-fast memory during computation while data for the next operation can be in transit onto the IPU from off-chip memory.
Our Poplar SDK has been co-designed from scratch with the IPU. At a high level, Poplar is fully integrated with standard machine learning frameworks so developers can port existing models easily, and get up and running out-of-the-box with new applications in a familiar environment.
Below these frameworks sits Poplar. For developers who want full control to exploit maximum performance from the IPU, Poplar enables direct IPU programming in C++.
Poplar also implements compiled-in communications which ensures reliable, deterministic communications and memory operations during execution.
16 petaFLOPS of AI-compute for both training and inference workloads, the IPU-POD64 is designed for AI at scale.Learn more
Our core building block for AI infrastructure. The IPU-M2000 packs 1 petaFLOP of AI compute in a slim 1U blade.Learn more
A secure IPU cloud service to add state of the art AI compute on demand - no on-premise infrastructure deployment required.Learn more
Connect with our experts to assess your AI infrastructure requirements and solution fit.