Performance Results

Here we provide performance results for the Bow Pod platforms. We provide results from our submission to the OGB-Large Scale Challenge, the MLPerf Training v2.0 submission, and results from our own benchmarking activities across a wider range of models for both Training & Inference

ogb-lsc logo

Open Graph Benchmark was established in 2020 with the aim of objectively measuring the performance of different graph models and compute systems.

OGB’s Large Scale Challenge began in 2021 in order to accelerate the development & adoption of larger graphs.

Today, the OGB-LSC consists of three challenges, based on different datasets:

PCQM4Mv2: Predicting a quantum property of molecular graphs

WikiKG90Mv2: Predicting missing facts in a knowledge graph based on Wikipedia
MAG240M: Automatically labelling subject areas on papers submitted to ArXiv

For the latest challenge OGB-LSC 2022, Graphcore submitted to PCQM4v2 in partnership with researchers from Valence and Mila. Graphcore also submitted to WikiKG90Mv2 with our own team.

Among a strong field of participants including Microsoft, Tencent, NVIDIA and many other world leading companies & research institutions, OGB-LSC 2022 submissions running on Graphcore IPUs secured:

First place in the PCQM4Mv2 challenge for predicting quantum properties of molecular graphs

First place in the WikiKG90Mv2 challenge for knowledge graphs.

These results are summarised below, with further details about OGB-LSC competition results available at https://ogb.stanford.edu/neurips2022/results/

Evaluation Metric using ensembled results using test-challenge dataset | PQM4Mv2 throughput measured in graphs/s | WikiKG90M2 throughput measured in triples/s

Bow Platform - Training

Here we provide Training performance results for the Bow Pod platforms. Throughput in this context is defined as the number of input data points (sequences, images, or rows) processed by the model per second.

The below results detail the obtained throughput values for each of the referenced models in the specified configuration.

Bow Platform - Inference

Model inference in this context refers to running a trained model on input data to infer output. Inference performance in production setups is typically measured on two metrics: throughput (as defined previously) and latency, which is defined in this context as the amount of time taken for the model to provide an output given an input.

Here below we provide results for the new Bow-2000 platform as throughput and latency for a given batch size.

MLPerf v2.0 Training Performance

For our submissions in to MLPerf Training version 2.0 we have chosen to submit for the popular application benchmark categories of Image Classification (ResNet-50) and Natural Language Processing (BERT), and also a new entry as an Open submission in the Speech Transcription category for RNN-T

There are two divisions for submissions. The Closed division requires submitters to use exactly the same model and optimizer implementation that includes defining hyperparameter state and training epochs. There is also an Open division that fosters and supports innovation by supporting different model implementations more tuned to different processor capabilities or as in this case, more aligned to customer requirements

MLPerf v2.0 Training Results | MLPerf ID: 2.0-2045, 2.0-2049, 2.0-2051, 2.0-2053

MLPerf v2.0 Training Results | MLPerf ID: 2.0-2047, 2.0-2050, 2.0-2052, 2.0-2054

The MLPerf name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved.
Unauthorized use strictly prohibited. See www.mlperf.org for more information.

IPU-POD Classic - Training

Training a machine learning model involves running the algorithm over an input dataset (training data) until the model converges - meaning that it has learned to produce the desired output to a specified accuracy. Throughput in this context is defined as the number of input data points (sequences, images, or rows) processed by the model per second. Throughput is often used as a measure of hardware performance as it is directly related to the time for the model to train to a specified accuracy.

The results provided below detail the obtained throughput values for each of the referenced models in the specified configuration.

IPU-POD Classic - Time to Result

IPU-POD Classic - Inference

Model inference in this context refers to running a model on input data to infer output. Inference performance in production setups is typically measured on two metrics: throughput (as defined previously) and latency, which is defined as the time taken to execute an inference.

Precision Terminology: X.Y is defined as follows: X is the precision for storing the activations & gradients, and Y is the precision for storing the weights. When training in 16.16 weights we may still use FP32 for other variables (such as norms or momentum), and include stochastic rounding.

Benchmarks were generated using our examples on the Graphcore GitHub.

This page was last updated on Tuesday, July 4, 2023