<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=145304570664993&amp;ev=PageView&amp;noscript=1">

RESEARCH PAPERS

Graphcore Research: GroupBERT - Enhanced Transformer Architecture with Efficient Grouped Structures

Graphcore Research: GroupBERT - Enhanced Transformer Architecture with Efficient Grouped Structures

Ivan Chelombiev, Daniel Justus, Douglas Orr, Anastasia Dietrich, Frithjof Gressmann, Alexandros Koliousis, Carlo Luschi

Attention based language models have become a critical component in state-of-the-art NLP systems. However, these models have significant computational requirements, due to long training times, dense operations and large parameter count.

In this paper, Graphcore Research demonstrate a set of modifications to the structure of a Transformer layer, producing a more efficient architecture. This architecture is applied to language representation learning and demonstrates a superior performance compared to BERT models of different scales. This results in improved efficiency, both in terms of floating-point operations (FLOPs) and time-to-train.

Oxford-Man Institute & University of Oxford: Multi-Horizon Forecasting for Limit Order Books: Novel Deep Learning Approaches and Hardware Acceleration using Intelligent Processing Units

Oxford-Man Institute & University of Oxford: Multi-Horizon Forecasting for Limit Order Books: Novel Deep Learning Approaches and Hardware Acceleration using Intelligent Processing Units

Zihao Zhang, Stefan Zohren

Researchers at the Oxford-Man Institute of Quantitative Finance have used Graphcore’s Intelligence Processing Unit (IPU) to dramatically accelerate the training of advanced price prediction models, using techniques which are typically plagued by computational bottlenecks when run on other types of processor.

The IPU’s designed-for-AI architecture allowed the OMI team to reduce the training times for their multi-horizon forecasting models to the point where they could deliver significant commercial advantage by more accurately estimating market price movements. Such models can be used in the development of alpha for fast trading and in market making strategies.

Graphcore Research: Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence

Graphcore Research: Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence

Antoine Labatie, Dominic Masters, Zach Eaton-Rosen, Carlo Luschi

We investigate the reasons for the performance degradation incurred with batch-independent normalization. We find that the prototypical techniques of layer normalization and instance normalization both induce the appearance of failure modes in the neural network's pre-activations: (i) layer normalization induces a collapse towards channel-wise constant functions; (ii) instance normalization induces a lack of variability in instance statistics, symptomatic of an alteration of the expressivity.

To alleviate failure mode (i) without aggravating failure mode (ii), we introduce the technique "Proxy Normalization" that normalizes post-activations using a proxy distribution. When combined with layer normalization or group normalization, this batch-independent normalization emulates batch normalization's behavior and consistently matches or exceeds its performance.

Graphcore Research: Making EfficientNet More Efficient: Exploring Batch-Independent Normalization, Group Convolutions and Reduced Resolution Training

Graphcore Research: Making EfficientNet More Efficient: Exploring Batch-Independent Normalization, Group Convolutions and Reduced Resolution Training

Dominic Masters, Antoine Labatie, Zach Eaton-Rosen,

Graphcore Research examines three methods for optimising state-of-the-art computer vision model EfficientNet’s performance on Intelligence Processing Units (IPUs), in a new paper. These approaches are :(i) generalising depthwise convolutions to group convolutions; (ii) adding proxy-normalized activations
to match batch normalization performance with batch-independent statistics; (iii) reducing compute by lowering the training resolution and inexpensively fine-tuning at higher resolution.

By combining all three techniques, IPUs delivered accelerations of up to 7x on training and more than 3.6x on inference.

University of Bristol: Using the Graphcore IPU for traditional HPC applications

University of Bristol: Using the Graphcore IPU for traditional HPC applications

Thorben Louw, Simon McIntosh-Smith

The increase in ML workloads means that AI accelerators are expected to become common in supercomputers, evoking considerable interest in the scientific HPC community about how these devices might also be exploited for traditional HPC workloads.

In this paper, we report our early results using Graphcore's IPU for stencil computations on structured grid problems, which are used for solvers for differential equations in domains such as computational fluid dynamics. We demonstrate that the IPU and its low-level programming framework, Poplar, expose sufficient programmability to express these HPC problems, and achieve performance comparable to that of modern GPUs.

Graphcore Research & UMass Amherst: Accelerating Simulation-based Inference with Emerging AI Hardware

Graphcore Research & UMass Amherst: Accelerating Simulation-based Inference with Emerging AI Hardware

Sourabh Kulkarni, Alexander Tsyplikhin, Mario Michael Krell, Csaba Andras Moritz

In this work, we explore hardware accelerated simulation-based inference over probabilistic models, by combining massively parallelized ABC inference algorithm with the cutting-edge AI chip solutions that are uniquely suited for this purpose. As a proof-of-concept, we demonstrate inference over a probabilistic epidemiology model used to predict the spread of COVID-19. Two hardware acceleration platforms are compared - the Tesla V100 GPU and the Graphcore Mk1 IPU. Our results show that while both of these platforms outperform multi-core CPUs, the Mk1 IPUs are 7.5x faster than the Tesla V100 GPUs for this workload.

Google Research, UC Berkeley & Graphcore Research: Parallel Training of Deep Networks with Local Updates

Google Research, UC Berkeley & Graphcore Research: Parallel Training of Deep Networks with Local Updates

Michael Laskin, Luke Metz, Seth Nabarro, Mark Saroufim, Badreddine Noune, Carlo Luschi, Jascha Sohl-Dickstein, Pieter Abbeel

In this paper, we investigate how to continue scaling compute efficiently beyond the point of diminishing returns for large batches through local parallelism, a framework which parallelizes training of individual layers in deep networks by replacing global backpropagation with truncated layer-wise backpropagation. Local parallelism enables fully asynchronous layer-wise parallelism with a low memory footprint, and requires little communication overhead compared with model parallelism. We show results in both vision and language domains across a diverse set of architectures, and find that local parallelism is particularly effective in the high-compute regime.

Graphcore Research: Improving Neural Network Training in Low Dimensional Random Bases

Graphcore Research: Improving Neural Network Training in Low Dimensional Random Bases

Frithjof Gressmann, Zach Eaton-Rosen, Carlo Luschi

Graphcore Research is exploring novel ways to train neural networks that could allow us to scale to substantially larger models in future.

In this paper, we revisit a simple approach to reduce the effective network dimensionality using random projections. We leverage the hardware-accelerated random number generation of the IPU to train in randomly selected directions of the weight space. Applying smaller independent random projections to different parts of the network and re-drawing them at every step significantly improves the obtained accuracy.

Graphcore Research & Ford: A Follow-The-Leader Strategy using Hierarchical Deep Neural Networks with Grouped Convolutions

Graphcore Research & Ford: A Follow-The-Leader Strategy using Hierarchical Deep Neural Networks with Grouped Convolutions

José Solomon, François Charette

A follow-the-leader strategy can be implemented using a hierarchical Deep Neural Network (DNN) end-to-end driving model to match the direction and speed of a target pedestrian. Using a classifier DNN, pedestrian movements can be tracked to determine if the pedestrian is in the camera sensor’s field of view. The autonomous vehicle’s steering and throttle can then be adjusted by a regression DNN. These DNNs also incorporate grouped convolutions to boost model performance.

In this paper, Graphcore Research and Ford Motor Company leverage the fine-grain compute capabilities of the Graphcore IPU to minimise time-to-train for these Hierarchical Deep Neural Networks.

University of Bristol: Studying the potential of Graphcore IPUs for applications in Particle Physics

University of Bristol: Studying the potential of Graphcore IPUs for applications in Particle Physics

Lakshan Ram Madhan Mohan, Alexander Marshall, Samuel Maddrell-Mander, Daniel O'Hanlon, Konstantinos Petridis, Jonas Rademacker, Victoria Rege, Alexander Titterton

This paper presents the first study of Graphcore's Intelligence Processing Unit (IPU) in the context of particle physics applications. 

Comparisons are made for neural-network-based event simulation, multiple-scattering correction, and flavour tagging, implemented on IPUs, GPUs and CPUs, using a variety of neural network architectures and hyperparameters. Additionally, a Kálmán filter for track reconstruction is implemented with promising results.

Imperial College London: Bundle Adjustment on a Graph Processor

Imperial College London: Bundle Adjustment on a Graph Processor

Joseph Ortiz, Mark Pupilli, Stefan Leutenegger, Andrew J. Davison

This paper shows for the first time that the classical computer vision problem of bundle adjustment (BA) can be solved extremely fast on a graph processor such as Graphcore's Intelligence Processing Unit (IPU) using Gaussian Belief Propagation.

Gaussian Belief Propagation is an effective algorithmic framework for spatial AI problems where estimates are needed in real time with new measurements constantly being fed into the algorithm.

Qwant: Graphcore C2 Card performance for image-based deep learning application: A Report

Qwant: Graphcore C2 Card performance for image-based deep learning application: A Report

Ilyes Kacher, Maxime Portaz, Hicham Randrianarivo, Sylvain Peyronnet

Graphcore's architecture of the processor has been designed to achieve state of the art performance on current machine intelligence models for both training and inference.

In this paper, we report on a benchmark in which we have evaluated the performance of IPU processors on deep neural networks for inference. We focus on deep vision models such as ResNeXt. We report the observed latency, throughput and energy efficiency.

Citadel: Dissecting the Graphcore IPU Architecture via Microbenchmarking

Citadel: Dissecting the Graphcore IPU Architecture via Microbenchmarking

Zhe Jia, Blake Tillman, Marco Maggioni, Daniele Paolo Scarpazza

This report focuses on the architecture and performance of the Intelligence Processing Unit (IPU), a novel, massively parallel platform introduced by Graphcore and aimed at Artificial Intelligence/Machine Learning (AI/ML) workloads.

The study dissects the IPU’s performance behavior using microbenchmarks that were crafted for the purpose.

Graphcore Research: Revisiting Small Batch Training for Deep Neural Networks

Graphcore Research: Revisiting Small Batch Training for Deep Neural Networks

Dominic Masters, Carlo Luschi

The team at Graphcore Research addresses mini-batch stochastic gradient optimization of modern deep network architectures.

In this paper, we review common assumptions on learning rate scaling and training duration, as a basis for an experimental comparison of test performance for different mini-batch sizes. Our experiments show that small batch sizes produce the best results.

×