New Graphcore Poplar 1.2 SDK released

New Graphcore Poplar SDK 1.2 released

Written by Laurence Herbert

Posted Jul 07, 2020

Our Poplar® graph framework and SDK continues to go from strength to strength. Today we reached a major milestone and have open sourced our PopLibs™ Poplar Libraries on GitHub. We’ve also added many significant new features and optimisations to improve performance, accessibility and ease of use with the release of a major new software version – Poplar SDK 1.2.

Key new features include enhanced support for PyTorch and Keras, new sparsity libraries and enhanced convolutions support, the introduction of Streaming Memory – a significant optimisation to our Exchange Memory – and improved tools with v2 of PopVision Graph Analyser.

As our Poplar SDK matures and our user base grows, in addition to new fully stable features, we have introduced several preview experimental features with this release to meet demand. This is indicated below where appropriate.

What’s new in SDK 1.2?

Enhanced Frameworks Support:

PyTorch for IPU (preview feature)

PyTorch users can now develop and execute entire models in a PyTorch environment on the IPU. For large-scale models, multi-IPU constructs are supported. Several training and inferences examples and documentation are included.

Additional models, operator coverage and optimisations will be available in subsequent releases. 

Keras for IPU

We have increased Keras functionality in TensorFlow 2. Examples are available in Targeting the IPU from TensorFlow for building Keras models in TensorFlow 2 via special TensorFlow functions:

  • tensorflow.python.ipu.keras.Sequential(layers)
  • tensorflow.python.ipu.keras.PipelinedModel(stages)

These new functions take TensorFlow 2 Keras layers and produce a model that can be trained with the standard Keras API. IPU Keras layers can now be used to target PopLibs features such as PopLSTM, and we have added support for pipelined and single-IPU models, and training capability via the fit API with support for prediction and evaluation.

New libraries and features for improved model performance:

Optimised Convolution Libraries

The IPU architecture is naturally suited to state of the art computer vision models like ResNext and EfficientNet which offer higher levels of accuracy than traditional ResNets. We’ve introduced new and improved kernels for improving performance of convolutions-based models even further, including:

  • A new mechanism to run multiple independent convolutions in parallel on a single IPU that can be different shapes and sizes
  • The ability to create octave convolutions in models by selecting suitably sized multi-convolutions
  • Optimisations for depth-wise convolutions for automatic acceleration from the IPU

These new features enable models that are dependent on such convolutions, to achieve the highest levels of accuracy at record-breaking speed. You can find out more in our new blog Delving Deep Into Modern Computer Vision Models by Graphcore Research.

Sparsity Libraries (preview feature)

Previews of new sparsity kernel libraries are now available which allow sparse deep learning models to be run faster and more efficiently.

Static block sparsity library support: A new PopLibs™ library to perform sparse matrix by dense matrix multiplication (and other operations) efficiently when the sparsity pattern is known and fixed.
 

Dynamic sparsity library support: A new PopLibs™ library to perform sparse matrix by dense matrix multiplication (and other operations) efficiently when the sparsity pattern changes during training.

Exchange Memory Management: Introducing Streaming Memory

We have been looking forward to going public with our Exchange Memory management features for some time. These features take advantage of the IPU's existing unique hardware features and advanced architectural design with respect to memory and data access. You can find a short description of the features below and a more extensive deep dive in our new blog Intelligent Memory for Intelligent Computing. 

New Remote Buffers features in Poplar 1.2 extend our Exchange Memory architecture to include Streaming Memory to complement the IPU’s extensive In-Processor Memory capacity. 

  • Remote buffers in Poplar to store data in Streaming Memory outside the IPU
  • Remote buffers to store optimiser state variables in Streaming Memory in TensorFlow. When using pipelining in TensorFlow, if developers are using an optimiser with an extra state (such as momentum), there is now the option to place the variables containing the extra state in Streaming memory.
  • Parameter streaming in PopART to keep all weights and activations in Streaming Memory – as only part of the model needs to reside in In-Processor Memory at any one time, it is possible to run much bigger models.
  • Embeddings using remote buffers in TensorFlow to keep embeddings tables in streaming memory during training to further enhance memory efficiency. These types of embeddings often occur in natural language processing or recommendation models. It should be noted that this is a preview experimental feature.

PopVision™ Graph Analyser V.2

The PopVision Graph Analyser provides detailed insight into how machine intelligence applications run and perform on the IPU, helping to inform model optimisation. With our Poplar 1.2 release, we are adding over 25 new features for an even deeper performance analysis.

Key features include:

  • Compare reports from two different compilations or executions, memory, liveness, program tree and execution trace
  • Generate visualisation reports of Bulk Synchronous Parallel (BSP) trace from the IPU to show per-tile program execution cycles and exchange execution trace views
  • View Liveness per IPU processor
  • Access context-sensitive help for easy user onboarding
  • Exchange execution trace views
  • Compare tile profiles within a single program

Multi-Chassis Data Parallel Training in TensorFlow & PopART (preview feature)

For large-scale training setups, we’re making a preview of multi-chassis data parallel training available in both TensorFlow and PopART using Horovod and remote buffers with host collective operations. This means that it will be possible to run multiple instances of the same TensorFlow or PopART application (potentially on different hosts) and instances will then collaborate for combined data parallel training. In future releases, we will continue to optimise this capability as demand for IPU scalability increases.

New OS Support for Ubuntu 18.04 and Centos 7.6

We continue to expand our OS support for IPU developers. With Poplar 1.2, we have included support for Ubuntu 18.04 and Centos 7.6.

Discover more Poplar Resources

Whether you’re new to the IPU or an experienced user looking to create new machine intelligence models, our Developer Portal has a wide variety of resources for programming the IPU and accelerating your models. Check out our documentation, video walkthroughs, coding tutorials and performance benchmarks to get up and running fast on the IPU.

Learn more about Poplar

Go straight to GitHub

Written by Laurence Herbert

Posted Jul 07, 2020