The Intelligence Processing Unit (IPU) has been designed from the ground up to support new breakthroughs in machine intelligence. Together with our production ready Poplar® software stack, it gives developers a powerful, efficient, scalable, and high performance solution which enables new innovations in AI. Customers can tackle their most difficult AI workloads by accelerating more complex models and developing entirely new techniques.Read the product brief
The Graphcore IPU preview on Microsoft Azure is now open for customers focused on developing new breakthroughs in machine intelligence.Register now
The Dell EMC DSS 8440 machine intelligence server with Graphcore technology for enterprise customers building out on-premise AI compute.Register now
Cirrascale offers an IPU bare-metal cloud service and Dell EMC DSS8440 IPU servers for sale for on-premise customer applicationsBuy now
The IPU delivers over 25% faster time-to-train with the BERT language model, training BERT-base in 36.3 hours with seven C2 IPU-Processor PCIe cards, each with two IPUs, in an IPU Server system. For BERT inference, we see more than 2x higher throughput at the lowest latency resulting in unprecedented speedups.
Graphcore C2 IPU-Processor PCIe card achieves 3.7x higher throughput at 10x lower latency compared to a leading alternative processor. High throughput at the lowest possible latency is key in many of the important use cases today.
Delivering a new level of fine-grained, parallel processing across thousands of independent processing threads on each individual IPU. The whole machine intelligence model is held inside the IPU with In-Processor Memory to maximise memory bandwidth and deliver high throughput for faster time to train and the lowest latency inference.
State of the art performance with today's large language models and conventional CNNs and dramatic breakthroughs with new higher accuracy models, like ResNext and probabilistic systems. Legacy processors struggle with non-aligned and sparse data accesses which are critical for next generation models. The IPU has been designed to support complex data access efficiently and at much higher speeds.
High performance training and low latency inference on the same hardware, improving utilisation and flexibility in the cloud and on-premise, vastly improving the total cost of ownership.
The IPU is designed to scale. Models are getting larger and demand for AI compute is scaling exponentially. High bandwidth IPU-Links™ allow multiple IPUs to be clustered, supporting huge models. Legacy architectures struggle on non-aligned and sparse data accesses. The IPU has been designed to support complex data access efficiently and at much higher speeds, which will be critical to run gigantic, next generation models efficiently