Here we provide results from our recent MLPerf Training v1.0 submission, and results from our own benchmarking activities across a wider range of models for both Training & Inference
MLPerf v1.0 Training Performance
For our very first submissions in to MLPerf Training version 1.0 we have chosen to submit for the popular application benchmark categories of Image Classification (ResNet-50) and Natural Language Processing (BERT).
There are two divisions for submissions. The Closed division requires submitters to use exactly the same model and optimizer implementation that includes defining hyperparameter state and training epochs. There is also an Open division that fosters and supports innovation by supporting different model implementations more tuned to different processor capabilities but ensures that exactly the same model accuracy and quality is reached as in the Closed division.
MLPerf v1.0 Training Results | MLPerf ID: 1.0-1098, 1.0-1099
MLPerf v1.0 Training Results | MLPerf ID: 1.0-1026, 1.0-1028
The MLPerf name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved.
Unauthorized use strictly prohibited. See www.mlperf.org for more information.
Training a machine learning model involves running the algorithm over an input dataset (training data) until the model converges - meaning that it has learned to produce the desired output to a specified accuracy. Throughput in this context is defined as the number of input data points (sequences, images, or rows) processed by the model per second. Throughput is often used as a measure of hardware performance as it is directly related to the time for the model to train to a specified accuracy.
The results provided below detail the obtained throughput values for each of the referenced models in the specified configuration. All configurations running on real data are verified for convergence.
Training: Time to Result
Model inference in this context refers to running a model on input data to infer output. Inference performance in production setups is typically measured on two metrics: throughput (as defined previously) and latency, which is defined as the time taken to execute an inference.
Precision Terminology: X.Y is defined as follows: X is the precision for storing the activations & gradients, and Y is the precision for storing the weights. When training in 16.16 weights we may still use FP32 for other variables (such as norms or momentum), and include stochastic rounding.
Benchmarks were generated using our examples on the Graphcore GitHub.
This page was last updated on Wednesday, June 30, 2021