Poplar and Poplibs API Reference¶
Using the libraries¶
The Poplar library contains functions for creating and executing graphs, transferring data to and from the IPU, and managing the hardware. The Poplibs library contains higherlevel mathematical and machinelearning functions.
For more information, see the Poplar and Poplibs User Guide.
Adding Poplibs code to a graph¶
Before using any of the Poplibs libraries you need to add the device code for the library to your program graph. Each library includes a function to do this. Your program will need to call this function for each of the libraries used. For example, if your program uses popops
and poprand
, then you will need to include the following in your program:
#include <popops/codelets.hpp>
#include <poprand/codelets.hpp>
... // create the graph object
popops::addCodelets(graph);
poprand::addCodelets(graph);
Where graph
is the object containing your graph program.
Setting Options¶
Several functions have options to modify their behaviour. These are specified, using the poplar::OptionFlags
class, as a series of optionvalue pairs, represented as strings.
There are three general classes of options:

Those that support debug, trace and profiling

Targetrelated options (memory use, for example)

Options to control optimisation
In addition, in Poplibs, there are options for finetuning the behaviour of specific functions such as convolution or matrix multiply.
Environment variables¶
Some options can also be specified using environment variables. This will override the values in the program. The currently support environment variables for setting options are:

POPLAR_ENGINE_OPTIONS
(seepoplar::Engine
) 
POPLAR_SIMULATOR_OPTIONS
(seepoplar::Device
)
Option values¶
The option values are typically either numeric or one of a list of enumerated values. The numeric values can be either an integer or a decimal value. Integer values can be prefixed with “0x” to indicate a hexadecimal value. The allowed range of values is documented, where relevant.
The options and allowed values are documented as shown in the examples below.
startTileMultiplier
Integer [=0]
availableMemoryProportion
Decimal between 0 and 1 [=0.6]
partialsType
(half, float) [=float]
This indicates the type of the numeric value (integer or decimal), or a list of the allowed values. The default value is shown in square brackets.
Some options have more complex values, such as a commaseparated list of integers or JSON structures. These are described in each case.
For example, the poplar::Engine::printProfileSummary
function has an opt
parameter to control the information displayed:
// Generate profile summary without execution steps but including
// variable storage
engine.printProfileSummary(std::cout,
);
An invalid_option
exception may be thrown if the value of the option is not recognised or is out of range.
Unsupported options¶
There are a number of options in the C++ header files that are not documented here. These are unsupported. They are either deprecated (and so may disappear in future) or are intended purely for internal use by Graphcore, typically to support testing.
Linking¶
When compiling your program, you will need to add the libraries used to the linker command line. For example:
$ g++ std=c++11 myprogram.cpp lpoplar lpopops lpoprand
Poplar API reference¶
Utility classes¶
poplar/ArrayRef.hpp¶
poplar/Interval.hpp¶
 namespace
poplar

Functions
 template<class
T
>
structGenericInterval
¶  #include <Interval.hpp>
This class represents an interval that is closed at its lower bound and open at its upper bound.
It is almost always used with T = std::size_t, for which there is a convenient Interval typedef.
 template<class
poplar/OptionFlags.hpp¶
 namespace
poplar

Functions
 void
readJSON
(StringRef string, OptionFlags &flags)¶ 
Read options from a string in JSON format.
 Parameters


string
: The string to parse. 
flags
: The OptionFlags to update.

 Exceptions


parse_error
: if the input cannot be parsed.

 void
readJSON
(std::istream &stream, OptionFlags &flags)¶ 
Read options from a stream in JSON format.
 Parameters


stream
: The input stream to read from. 
flags
: The OptionFlags to update.

 Exceptions


parse_error
: if the input cannot be parsed.

 class
OptionFlags
¶  #include <OptionFlags.hpp>
A set of option/value string flags to be used in various APIs.
Public Types
 using
initializer_list
= std::initializer_list<OptionFlag>¶
Public Functions
OptionFlags
()¶
Construct a set of option flags.
The default constructor creates an empty set of flags.
~OptionFlags
()¶
OptionFlags
(const OptionFlags &other)¶
OptionFlags
(OptionFlags &&other)¶
 OptionFlags &
operator=
(const OptionFlags &other)¶
 OptionFlags &
operator=
(OptionFlags &&other)¶
OptionFlags
(initializer_list &&list)¶
Construct a set of option flags from an initializer list of string pairs.
Flags are set in the order they appear in the constructor.
Setting a flag more than once will result in the previous value for that option being overwritten.
 Parameters


initializer
: A list of option/value string pairs to set in the flags.

 void
set
(initializer_list &&list)¶ 
Set option flags from an initializer list of string pairs.
Flags are set in the order they appear in the list.
Setting a flag more than once will result in the previous value for that option being overwritten. If the option was already set in these flags then the previous value will be overwritten.
 Parameters


initializer
: A list of option/value string pairs to set in the flags.

 void
set
(StringRef option, StringRef value)¶ 
Set a single option to a value.
If the option was already set in these flags then the previous value will be over written.
 Parameters


option
: The option to set in the flags. 
value
: The value to set the option to in the flags.

 void
clear
()¶ 
Remove all set flags.
 class
iterator
: public std::iterator<std::forward_iterator_tag, OptionFlag>¶ 
Public Functions
~iterator
()¶
 const OptionFlag &
operator*
() const¶
 const OptionFlag *
operator>
() const¶
Friends
 friend
poplar::OptionFlags
 using
 void
poplar/ReplicatedStreamMode.hpp¶
 namespace
poplar
poplar/SerializationFormat.hpp¶
 namespace
poplar
poplar/StringRef.hpp¶
 namespace
poplar

Functions
poplar/SyncType.hpp¶
 namespace
poplar

Enums
 enum
SyncType
¶ 
An enumeration used to state what type of synchronisation a Sync program represents.
Values:
INTERNAL
¶
Each tile waits until all the other tiles in the same IPU reach the Sync program before continuing.
EXTERNAL
¶
Each tile waits until all the other tiles in all IPUs in the device reach the Sync program before continuing.
 enum
poplar/TypeTraits.hpp¶
 namespace
poplar

 struct
TypeTraits
¶  #include <TypeTraits.hpp>
A structure to provide information about arithmetic (integer and floating point) types.
Public Functions
 bool
isSimpleType
() const¶
 template<>
TypeTraitsmake
()¶
 template<>
constexpr boolisSimpleType
()¶
Public Static Functions
 template<typename
T
>
TypeTraitsmake
()¶
 template<typename
T
>
constexpr boolisSimpleType
()¶ 
Return true if it is a basic numeric type, i.e.
std::is_integral<> or std::is_floating_point<> is true, or it is IeeeHalf.
 bool
 struct
poplar/CSRFunctions.hpp¶
Functions to configure the floating behaviour of the tiles by programming the Control and Status Registers (CSR).
 namespace
poplar

Functions
 void
setFloatingPointBehaviour
(poplar::Graph &graph, poplar::program::Sequence &prog, const FloatingPointBehaviour &behaviour, const std::string &debugPrefix = "")¶ 
Set the floating point behaviour of a tile.
Configures the floating point behaviour of a tile, affecting the treatment of exceptions and selecting stochastic rounding according to the passed
behaviour
structure. Parameters


graph
: The Poplar graph 
prog
: The program to be extended 
behaviour
: A structure of type floatingPointBehaviour 
debugPrefix
: The prefix prepended to debugging info

 void
setStochasticRounding
(poplar::Graph &graph, poplar::program::Sequence &prog, bool behaviour, const std::string &debugPrefix = "")¶ 
Set stochastic rounding on or off for the selected tile.
Configures the stochastic rounding operation of a tile according to the passed
behaviour
parameter. Parameters


graph
: The Poplar graph 
prog
: The program to be extended 
behaviour
: Select stochastic rounding: true or false 
debugPrefix
: The prefix prepended to debugging info

 struct
FloatingPointBehaviour
¶  #include <CSRFunctions.hpp>
Structure to specify floating point behaviour.
 Parameters


inv
: If true, a floatingpoint invalid operation (defined by IEEE 754) will cause an exception.The invalid operations are:

Addition or subtraction where the operands are + or  infinity (inf) and the operation results in the subtraction of two infs; for example: (inf)+(+inf) or (+inf)(+inf).

Divisions: (+/0)/(+/0) and (+/inf)/(+/inf).

Multiplications: (+/0)*(+/inf) and (+/inf)*(+/0).

Remainder: x REM y where y=0 or x=(+/inf).

Real operations with complex results such as the square root or logarithm of a negative number.

Operations with NotaNumber as at least one operand.

Comparisons where one of the operands is NotaNumber.
See also nanoo below.


div
: If true a floating point divide by zero operation will cause an exception. 
oflo
: If true a floating point overflow will cause an exception. 
esr
: Enable stochastic rounding. 
nanoo
: Enable NotaNumber on overflow mode. When enabled, half precision calculations that have overflowed will produce a NotaNumber result, rather than saturating to the half precision max/min value, and the invalid operation (inv
) flag will be set.

 void
Exceptions¶
poplar/exceptions.hpp¶
 namespace
poplar

 struct
control_program_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when the construction of a graph program is invalid.
 struct
file_load_error
: public poplar::poplar_error¶
 struct
graph_connection_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown during construction of an Engine object if there is an error in the structure of graph, for example, if there are no edges to a vertex input or if there are multiple edges to a vertex input.
 struct
graph_cycle_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown during the construction is an Engine object if there are any cycles in the graph that are not broken by recurrent edges.
 struct
graph_memory_allocation_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when an memory allocation fails.
Public Functions
graph_memory_allocation_error
(const char *s)¶
Public Members
 ProfileValue
graphProfile
¶
 struct
graph_object_creation_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown in the construction of a GraphProgEnv object if there was an error in the creation of the graph program object file.
 struct
graph_object_load_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown in the construction of a GraphProgEnv object if there was an error in loading the graph program object file.
 struct
graph_program_compilation_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown in the construction of a GraphProgEnv object if there are any compilation errors in the graph program.
 struct
graph_replication_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when an invalid operation is carried out on a replicated graph.
 struct
index_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown if the index of a subscript is out of the bounds of the field it is accessing or if a index of a tensor is invalid.
 struct
invalid_machine_model
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when an invalid model of the IPU (for performance model profiling) has been specified.
 struct
invalid_option
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when an unrecognised or invalid option is passed to a Poplar API.
 struct
invalid_tile_mapping
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when the tile mapping passed to the UserTilePartitioner is invalid.
 struct
link_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when the linking stage for codelets fails.
output is the output from the linker command.
Public Functions
link_error
(const char *s, const char *out = "")¶
 struct
memory_elem_constraints_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when an invalid memory element constraint has been provided in a codelet.
 struct
missing_cycle_estimate
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when an Engine is constructed with profiling enabled but a vertex does not have a getCycleEstimate method specified.
 struct
no_environment
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown, in the construction of a GraphProgEnv object, in mixedmode compilation, if there is no graphprogramming environment available, in particular if the program has not been compiled with the ‘popc’ commandline tool.
 struct
no_size_specified
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown if the size of a field is not specified in a Graph object when an EngineBuilder object is constructed.
 struct
overflow_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when an arithmetic overflow occurs within Poplar.
 struct
parse_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when an input file or string cannot be parsed.
 struct
poplar_error
: public runtime_error¶ 
Subclassed by poplar::control_program_error, poplar::file_load_error, poplar::graph_connection_error, poplar::graph_cycle_error, poplar::graph_memory_allocation_error, poplar::graph_object_creation_error, poplar::graph_object_load_error, poplar::graph_program_compilation_error, poplar::graph_replication_error, poplar::index_error, poplar::invalid_machine_model, poplar::invalid_option, poplar::invalid_tile_mapping, poplar::link_error, poplar::memory_elem_constraints_error, poplar::missing_cycle_estimate, poplar::no_environment, poplar::no_size_specified, poplar::overflow_error, poplar::parse_error, poplar::profiling_disabled, poplar::runtime_error, poplar::stream_connection_error, poplar::stream_memory_allocation_error, poplar::symbol_error, poplar::tensor_creation_error, poplar::tensor_io_state_error, poplar::type_error, poplar::unknown_field, poplar::unknown_vertex_type
 struct
profiling_disabled
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown if profiling information is requested from an Engine but that Engine has not been constructed with profiling enabled.
Public Functions
profiling_disabled
()¶
 struct
runtime_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when the interaction with the device via graphcore device access fails.
 struct
stream_connection_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when an invalid attempt is made to connect a data stream.
 struct
stream_memory_allocation_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when allocation of stream buffers fails.
 struct
symbol_error
: public poplar::poplar_error¶
 struct
tensor_creation_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown in the construction of a tensor if invalid arguments are provided to the tensor creation function or method.
 struct
tensor_io_state_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when an attempt is made to mark a tensor as an input or output, but the argument references a view of a tensor, rather than a whole tensor.
 struct
type_error
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when there is an error related to the field types of vertices, for example, when the source of an edge contains an input, the types of inputs and source field between an edge do not match, or when a field cannot be subscripted.
 struct
unknown_field
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when a field name is specified that does not exist in the graphprogramming environment.
 struct
unknown_vertex_type
: public poplar::poplar_error¶  #include <exceptions.hpp>
This exception is thrown when a vertex type name is specified that does not exist in the graph programming environment.
 struct
Graph classes¶
poplar/CodeletFileType.hpp¶
 namespace
poplar

Enums
 enum
CodeletFileType
¶ 
Values:
PreprocessedAsmSource
¶
A graph assembly language source file.
AsmSource
¶
A graph assembly language file with preprocessor macros.
CSource
¶
A graph C source file.
CppSource
¶
A graph C++ source file.
Object
¶
A graph program object file.
Auto
¶
Auto detect based on file name.
Functions
 CodeletFileType
getCodeletFileType
(const char *path)¶
 enum
poplar/CycleEstimateFunc.hpp¶
 namespace
poplar

Typedefs
 using
CycleEstimateFunc
= std::function<std::uint64_t(const VertexIntrospector &v, const Target &target)>¶ 
Functions of this type can be used as cycle estimator callbacks for new vertex types.
 using
poplar/DataStream.hpp¶
 namespace
poplar

 class
DataStream
¶  #include <DataStream.hpp>
An object representing a stream for communicating between the host and the device.
A stream is a unidirectional communication from the host to the device, or from the device to the host.
The maximum buffer size for each stream is 128 MBytes.
Public Functions
DataStream
()¶
DataStream
(const DataStream&)¶
DataStream
(DataStream&&)¶
~DataStream
()¶
 DataStream &
operator=
(const DataStream&)¶
 DataStream &
operator=
(DataStream&&)¶
 unsigned
replicationFactor
() const¶
 ReplicatedStreamMode
replicatedMode
() const¶
 DataStreamType
type
() const¶
 const core::DataStreamRef &
getImpl
() const¶
 class
RemoteBuffer
¶  #include <DataStream.hpp>
A remote buffer is a region of remote (meaning not on the IPU) memory that is used as a cache.
It is implemented as two DataStreams: one to write to the remote memory, the other to read the data back to the IPU.
Public Functions
RemoteBuffer
(DataStream &&ipuToHost, DataStream &&hostToIpu, size_t repeats = 1, bool rearrangeOnHost = false)¶
 DataStream
getIpuToHostStream
() const¶
 DataStream
getHostToIpuStream
() const¶
 size_t
getRepeats
() const¶
 bool
isRearrangeOnHost
() const¶
 class
poplar/DataStreamType.hpp¶
 namespace
poplar

Enums
 enum
DataStreamType
¶ 
An enumeration to represent the different types of DataStream or stream components of a RemoteBuffer.
Values:
HostToDeviceFIFO
¶
A DataStream from host to device.
DeviceToHostFIFO
¶
A DataStream from device to host.
HostToDeviceBuffer
¶
A stream from host to device in a remote buffer.
DeviceToHostBuffer
¶
A stream from device to host in a remote buffer.
Functions
 bool
isDeviceToHost
(DataStreamType type)¶
 bool
isHostToDevice
(DataStreamType type)¶
 bool
isRemoteBuffer
(DataStreamType type)¶
 enum
poplar/Graph.hpp¶
 namespace
poplar

 class
Graph
¶  #include <Graph.hpp>
This class represents a graph program to be executed on the IPU.
Public Functions
Graph
(const Target &target, unsigned sharedStructureTilesPerIPU = 0, replication_factor r = replication_factor(1))¶
Construct a graph object.
This constructor creates a Graph object using the given graph programming environment.
 Parameters


target
: The target the graph is being constructed to work with. 
sharedStructureTilesPerIPU
: The number of tiles to reserve to hold shared code and readonly data structures 
r
: Number of times graph is to be replicated (default is no replication)

Graph
(const Device &device, unsigned sharedStructureTilesPerIPU = 0, replication_factor r = replication_factor(1))¶
Construct a graph object.
This constructor creates a Graph object using the given graph programming environment.
 Parameters


device
: The device the graph is being constructed to work with. 
sharedStructureTilesPerIPU
: The number of tiles to reserve to hold shared code and readonly data structures 
r
: Number of times graph is to be replicated (default is no replication)

~Graph
()¶
 void
addCodelets
(StringRef src, CodeletFileType type = CodeletFileType::Auto, StringRef compileFlags = "")¶ 
Add a codelet to the graph.
A codelet is either a C, C++, or assembly source file, or a .gp object file. If a source file is given it is compiled for the graph’s target and then loaded into the graph. If it is an object file then it is loaded into the graph.
Symbols that codelets use are not resolved until the engine is built, so codelets can use symbols from each other by calling addCodelets() for each source or object file (or passing a list of files as a vector).
 Parameters


src
: The path to a source or object file containing codelets. 
type
: Specify the type of the codelet (source or precompiled). If Auto is used, the type is determined from the filename extension. 
compileFlags
: Additional flags to pass to the compiler if using source code. For example,g
to generate debug info.

 void
addCodelets
(StringRef src, CodeletFileType type, StringRef compileFlags, std::ostream &compileOutput)¶ 
Add a codelet to the graph and write error messages from the compilation process to the given output stream.
By default they are printed to cerr.
 void
addCodelets
(ArrayRef<std::string> xs, StringRef compileFlags = "")¶ 
Add a set of codelets to the graph.
These codelets can depend on each other, for example symbols defined in one can be used by any other. The order is not important.
 VertexRef
addVertex
(ComputeSet cs, StringRef vertexType)¶ 
Add a vertex to the graph.
 Parameters


cs
: The compute set to add the vertex to. 
vertexType
: The name of the type of the vertex. This must be a declared vertex type in the graph programming environment used to create the graph builder.

 VertexRef
addVertex
(ComputeSet cs, StringRef vertexType, ArrayRef<ConnectionDesc> connections)¶ 
Add a vertex to the graph and connect graph elements to some of its fields.
This variant of add vertex allows you to pass in a list of connection descriptions to connect graph elements to fields of the newly created vertex. The connection descriptions can be initialized with:

{ string, Tensor }  connect a tensor to a field.

{ string, FieldRef, bool }  connect a vertex field to a field.

{ string, T v }  connect a constant value to an input field.
For example, the following:
addVertex(cs, "MyVertex", );
Will create a vertex and connect a tensor to its x field and the vertex field v[“z”] to its y field.
 Parameters


cs
: The compute set to add the vertex to. 
vertexType
: The name of the type of the vertex. This must be a declared vertex type in the graph programming environment used to create the graph builder. 
connections
: A list of connection descriptions


 VertexRef
addExternalExchangeVertex
(ComputeSet cs, StringRef vertexType, unsigned incomingDownCount, bool usesEastEdge, bool sendsXReq)¶ 
Add an external exchange vertex to the graph.
A compute set can contain at most one external exchange vertex per tile. External exchange vertices cannot be mixed with non external exchange vertices in the same compute set. Before an external vertex is called we set the INCOMING_DCOUNT and INCOMING_MUX mux registers and synchronize all tiles containing external exchange vertices.
 Parameters


cs
: The compute set to add the vertex to. 
vertexType
: The name of the type of the vertex. This must be a declared vertex type in the graph programming environment used to create the graph builder. 
incomingDownCount
: The value to set the INCOMING_DCOUNT register to. 
usesEastEdge
: Whether the vertex uses an east edge exchange block. The INCOMING_MUX register is set to point to either the east edge or west edge depending on this argument. 
sendsXReq
: Whether this vertex is responsible for sending the XREQ packet. There must be at most one tile per exchange block context that sends the XREQ and the tile must be the same in every compute set containing external exchange vertices.

 Tensor
addVariable
(const Type &type, ArrayRef<std::size_t> shape, StringRef name = "")¶ 
Add a variable to the graph.
If using this function with a target with multiple tiles then the variable will initially have no tile mapping under the expectation that the tile mapping will be set later with
Graph::setTileMapping. If the target of the graph has only one tile then the tensor will be automatically mapped to that tile. Parameters


type
: The type of the elements of the variable. 
shape
: The shape of the variable. 
name
: An optional name to identify the variable for debugging/profiling purposes 
returns
: A Tensor referring to the variable in the graph.

 Tensor
addVariable
(const Type &type, ArrayRef<std::size_t> shape, VariableMappingMethod mappingMethod, StringRef name = "")¶ 
Add a variable to the graph.
 Return

A Tensor referring to the variable in the graph.
 Parameters


type
: The type of the elements of the variable. 
shape
: The shape of the variable. 
mappingMethod
: The method to use to initially map the variable to tiles. 
name
: An optional name to identify the variable for debugging/profiling purposes

 template<typename
T
>
TensoraddConstant
(const Type &type, ArrayRef<std::size_t> shape, ArrayRef<T> values, StringRef name = "<const>")¶ 
Add a constant to the graph.
A constant tensor is a tensor with every element initialized.
 Parameters


type
: The type of the elements of the constant. 
shape
: The shape of the constant. 
values
: Vector of values to initialize tensor elements to. 
name
: An optional name to identify the variable for debugging/profiling purposes

 template<typename
T
>
TensoraddConstant
(const Type &type, ArrayRef<std::size_t> shape, T val, StringRef name = "<const>", typename std::enable_if<TypeTraits::isSimpleType<T>()>::type * = nullptr)¶ 
Add a constant to the graph.
A constant tensor is a tensor with every element initialized to the same value. It cannot be connected to a vertex output.
 Parameters


type
: The type of the elements of the constant. 
shape
: The shape of the constant. 
val
: The value to initialize tensor elements to. 
name
: An optional name to identify the variable for debugging/profiling purposes

 template<typename
T
>
TensoraddConstant
(const Type &type, ArrayRef<std::size_t> shape, const T *val, StringRef name = "<const>", typename std::enable_if<TypeTraits::isSimpleType<T>()>::type * = nullptr)¶ 
Add a constant to the graph with multiple cell values.
A constant tensor is a tensor with every element initialized to the same value. It cannot be connected to a vertex output.
 Parameters


type
: The type of the elements of the constant. 
shape
: The shape of the constant. 
val
: The value to initialize tensor elements to. 
name
: An optional name to identify the variable for debugging/profiling purposes

 Tensor
addConstant
(const Type &type, ArrayRef<std::size_t> shape, const void *val, const TypeTraits &traits, bool broadcast, StringRef name = "<const>")¶
 Tensor
addConstantHalf
(const Type &type, ArrayRef<std::size_t> shape, uint16_t val, StringRef name = "<const>")¶ 
Add a constant to the graph, where the host data is type IEEE half.
A constant tensor is a tensor with every element initialized to the same value. It cannot be connected to a vertex output.
 Parameters


type
: The type of the elements of the constant. 
shape
: The shape of the constant. 
val
: The value to initialize tensor elements to.

 Tensor
addConstantHalf
(const Type &type, ArrayRef<std::size_t> shape, const uint16_t *val, StringRef name = "<const>")¶ 
Add a constant to the graph with multiple cell values, where the host data is type IEEE half.
A constant tensor is a tensor with every element initialized to the same value. It cannot be connected to a vertex output.
 Parameters


type
: The type of the elements of the constant. 
shape
: The shape of the constant. 
val
: The value to initialize tensor elements to.

 Tensor
clone
(const Type &type, const Tensor &t, StringRef name = "", TensorCloneMethod method = TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES)¶ 
Add a tensor to the graph that has the same size and tile mapping as Tensor t.
 Parameters


type
: The element type of the new tensor. 
t
: The tensor to be cloned. 
name
: A debug name to give to any new tensors allocated in the graph during the clone. If this is empty then the debug names will be derived from existing tensor debug names. 
method
: The method to use for the cloning (decides whether to preserve ordering/aliasing in the new tensor).

 Tensor
clone
(const Tensor &t, StringRef name = "", TensorCloneMethod method = TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES)¶ 
Add a tensor to the graph that has the same size and tile mapping as Tensor t.
 Parameters


t
: The tensor to be cloned. 
name
: A debug name to give to any new tensors allocated in the graph during the clone. If this is empty then the debug names will be derived from existing tensor debug names. 
method
: The method to use for the cloning (decides whether to preserve ordering/aliasing in the new tensor).

 void
connect
(FieldRef field, const Tensor &tensor)¶ 
Connect a tensor to a vertex field.
This function connects an a tensor with a vertex field. If the vertex field is an scalar input/output then a simple edge is added (and the tensor must be of zero dimension; in other words, a scalar). If the vertex field is an input/output of a vector then a vector edge is added (and the tensor must be of dimension 1). If the vertex field is a vector of inputs or outputs then the size of the field is set to the correct size and edges are added for every element of the tensor tensor (and the tensor must be of dimension 1). If the vertex field is a vector of input or output vectors then the tensor must be 2dimensional. In this case, the size of the vector field is set to the size of the first dimension and vector edges are added for every subvector of the two dimensional tensor.
 Parameters


tensor
: The tensor. 
field
: Reference to the vertex field to connect.

 template<typename
T
>
voidconnect
(FieldRef field, T v, typename std::enable_if<TypeTraits::isSimpleType<T>()>::type * = nullptr)¶ 
Connect a constant value to an input field.
This method creates a singleelement tensor containing a specified value and connects that tensor element to an input field.
 Parameters


v
: The value to connect. 
field
: The field to connect to.

 void
connect
(FieldRef field, ArrayRef<Tensor> tensors)¶ 
Connect a vector of tensors to a vertex field.
This function connects an vector a tensors with a vertex field. The field must be a vector of inputs or outputs. The field will be sized to the provided vector and each element will be connect to the corresponding element of the field.
 Parameters


tensors
: The vector of tensors. 
field
: Reference to the vertex field to connect.

 void
setCycleEstimate
(const VertexRef &v, std::uint64_t cycles)¶ 
Set the cycle estimate for a vertex.
 Parameters


v
: The vertex to set the estimate for. 
cycles
: The number of cycles that this vertex will use when run.

 std::uint64_t
getCycleEstimate
(const VertexRef &v) const¶ 
Get the cycle estimate for the specified vertex.
 Return

The number of cycles used when this vertex is run.
 Parameters


v
: The vertex to get the estimate for.

 Exceptions


missing_cycle_estimate
: if the cycle estimate is not available (for example, because the graph hasn’t been executed yet).

 void
registerCycleEstimator
(StringRef vertexTypeName, CycleEstimateFunc f)¶ 
 Parameters


vertexTypeName
: Type of vertex to register the estimator for. 
f
: Callback function that will compute a cycles estimate for all vertices of this type.

 unsigned
getNumVertices
(void) const¶ 
Get the number of vertices currently in the graph.
 Return

The numbers of vertices currently in the graph.
 ComputeSet
addComputeSet
(StringRef name = "")¶ 
Create a compute set within the graph.
 Return

The reference to the compute set.
 Parameters


name
: An optional identifier for the compute set that may be used during profiling/debugging.

 void
setFieldSize
(FieldRef field, std::size_t size)¶ 
Set the size of a vector field.
 Parameters


field
: The reference to the field. 
size
: The size of the field.

 std::size_t
getFieldSize
(FieldRef field) const¶ 
Get the size of a vector field.
 Return

The size of the field.
 Parameters


field
: The reference to the field.

 std::size_t
getMaxFieldDim
(StringRef vertexName, StringRef fieldName, unsigned dimIndex) const¶ 
Find the maximum size for a dimension of a field.
 Parameters


vertexType
: The type of vertex 
field
: The field 
dimIndex
: The index of the dimension

 Exceptions


index_error
: If there is no such dimension 
poplar_error
: If the field is not indexable

 double
getMaxVertexFieldValue
(StringRef vertexName, StringRef fieldName) const¶ 
Find the maximum value that can be represented by an element of a field.
 Parameters


vertexType
: The type of vertex 
field
: The field

 template<typename
T
>
voidsetInitialValue
(FieldRef field, T val, typename std::enable_if<TypeTraits::isSimpleType<T>()>::type * = nullptr)¶ 
Set the initial value of a field.
 Parameters


field
: The reference to the field. 
val
: The value to set the field to when the graph engine is created.

 template<typename
T
>
voidsetInitCallback
(FieldRef field, LateInitCallback<T> callback, typename std::enable_if<std::is_arithmetic<T>::value>::type * = nullptr)¶ 
Set the init callback for a field; the callback function will be called after graph construction and must return the init value of the field.
This can be called instead of calling setInitialValue(), or both can be called for the field, to ensure that the field has a (at least partially) valid starting value, for instance it if needs to be retrieved in an early stage of graph compilation, before storage allocation (for instance during cycle estimation)
Note that you must explicitly provide the template parameter T in the specialisation, when using this function, e.g.: setInitCallback<uint16_t>(vertex[“size”], sizeCallback) because the compiler will not be able to detect the correct type from the callback parameter.
 Parameters


field
: The reference to the field. 
callback
: The callback that will return the value for the field. 
<unnamed>
: This exists only to allow to insert the ‘is_arithmetic<T>’ check for the type T.

 void
setInitialValueHalf
(FieldRef field, uint16_t val)¶ 
Set the initial value of a field of type IEEE half.
 Parameters


field
: The reference to the field. 
val
: The value to set the field to when the graph engine is created.

 template<typename
T
>
voidsetInitialValue
(FieldRef field, ArrayRef<T> val)¶ 
Set initial values of a vector field.
 Parameters


field
: The reference to the vector field. 
val
: A vector value to set the field to when the graph engine is created.

 void
setInitialValueHalf
(FieldRef field, ArrayRef<uint16_t> val)¶ 
Set initial values of a vector field of type IEEE half.
 Parameters


field
: The reference to the vector field. 
val
: A vector value to set the field to when the graph engine is created.

 template<typename
T
>
voidsetInitialValue
(const Tensor &t, T val, typename std::enable_if<TypeTraits::isSimpleType<T>()>::type * = nullptr)¶ 
Set the initial value of a tensor element.
 Parameters


t
: The tensor representing the value to set. 
val
: The value to set the field to when the graph engine is created. A buffer of values can be provided to set the elements of a nonscalar tensor.

 void
setInitialValueHalf
(const Tensor &t, uint16_t val)¶ 
Set the initial value of a tensor element of type IEEE half.
 Parameters


t
: The tensor representing the value to set. 
val
: The value to set the field to when the graph engine is created. A buffer of values can be provided to set the elements of a nonscalar tensor.

 void
createHostWrite
(StringRef handle, const Tensor &t, bool rearrangeOnHost = false)¶ 
Mark a Tensor as being available as the destination of host to device copies.
This is a convenience function that creates a hosttodevice FIFO, and a Copy program that copies data from the FIFO to the tensor. When you call Engine::writeTensor() it copies the input data to the FIFO and then executes the Copy program on the device.
 See
 Parameters


handle
: A name to be associated with this host copy. 
t
: The tensor to be marked as an input. 
rearrangeOnHost
: Save IPU memory at the cost of exchange speed by rearranging the data on the host before sending it to the IPU, rather than doing an internal exchange. Note that due to alignment and size requirements of host exchange packets this may still require part of the transfer to be received to a temporary variable and copied to its destination.

 void
createHostRead
(StringRef handle, const Tensor &t, bool rearrangeOnHost = false)¶ 
Mark a Tensor as being available as the source of device to host copies.
This is a convenience function that creates a devicetohost FIFO, and a Copy program that copies data to the FIFO from the tensor. When you call Engine::writeTensor() it executes the Copy program on the device and then outputs the data from the FIFO.
 See
 Parameters


handle
: A name to be associated with this host copy. 
t
: The tensor to be marked as an output. 
rearrangeOnHost
: Save IPU memory at the cost of exchange speed by sending data in any order and rearranging it on the host, rather than doing an internal exchange before sending it.

 DataStream
addHostToDeviceFIFO
(StringRef handle, const Type &elementType, std::size_t numElements, ReplicatedStreamMode replicatedMode = ReplicatedStreamMode::REPLICATE)¶ 
Add a data stream to the graph for copying data from the host to the device.
 Parameters


handle
: A name to be associated with this stream 
elementType
: The type of data in the stream 
numElements
: The number of elements to be transferred from the stream by a Copy program. 
replicatedMode
: How the stream is replicated if this is a replicated graph.

 DataStream
addDeviceToHostFIFO
(StringRef handle, const Type &elementType, std::size_t numElements)¶ 
Add a data stream to the graph for copying data from the device to the host.
 Parameters


handle
: A name to be associated with this stream 
elementType
: The type of data in the stream 
numElements
: The number of elements to be transferred to the stream by a Copy program.

 RemoteBuffer
addRemoteBuffer
(StringRef handle, const Type &elementType, std::size_t numElements, std::size_t repeats = 1, bool rearrangeOnHost = false)¶ 
Add a remote buffer to the graph.
The remote buffer is a memory off the IPU which can be read and written by the IPU. A read returns the last written value.
 Parameters


handle
: A name to be associated with this remote buffer 
elementType
: The type of data in the remote buffer 
numElements
: The number of elements to be transferred to the remote buffer by a Copy program. 
repeats
: The number of tensor multiples to use 
rearrangeOnHost
: Switch to tell the remote buffer to rearrange on the host instead of on the IPU

 void
outputVertexGraph
(std::ostream &outputStream, ArrayRef<program::Program> progs = {}) const¶ 
Output to a stream the vertex graph in dot file format.
 Parameters


outputStream
: The C++ stream to output the dot file onto.

 void
outputComputeGraph
(std::ostream &outputStream, ArrayRef<program::Program> progs = {}) const¶ 
Output to a stream the compute graph in dot file format.
 Parameters


outputStream
: The C++ stream to output the dot file onto.

 void
setTileMapping
(VertexRef v, unsigned tileNum)¶ 
Map a vertex to a specific tile on the device.
 Parameters


v
: Reference to the vertex to map 
tileNum
: The tile number to map the vertex to.

 void
setTileMapping
(const Tensor &t, unsigned tileNum)¶ 
Map a tensor slice to a specific tile on the device.
 Parameters


t
: The tensor or tensor slice to map. 
tileNum
: The tile number to map to.

 TileToTensorMapping
getTileMapping
(const Tensor &t, bool requireComplete = true) const¶ 
Inspect the tile mapping of a tensor.
 Return

The mapping from tiles to a vector of intervals mapped to the tile (implemented as vector indexed by the tile number). The lower and upper bound of each interval are elements number in the flattened tensor.
 Parameters


t
: The tensor to inspect 
requireComplete
: Ift
is not fully mapped andrequireComplete
is true then an invalid_tile_mapping exception will be thrown.

 TileToTensorMapping
getTileMapping
(const Tensor &t, bool *isComplete) const¶ 
Inspect the tile mapping of a tensor.
 Return

The mapping from tiles to a vector of intervals mapped to the tile (implemented as vector indexed by the tile number). The lower and upper bound of each interval are elements number in the flattened tensor.
 Parameters


t
: The tensor to inspect 
isComplete
: If nonnull, updated to indicate whether the mapping is complete.

 void
setTileMapping
(const Tensor &t, const TileToTensorMapping &mapping)¶ 
Set the tile mapping of a tensor based on an explicit map from tiles to tensor intervals.
 Parameters


t
: The tensor to map 
mapping
: The mapping from tiles to a vector of intervals to be placed on that tile (implemented as vector indexed by the tile number). The lower and upper bound of each interval are elements number in the flattened tensor.

 Tensor
getVariable
(VariableRef v) const¶ 
Get a tensor representing an entire variable.
 Return

A Tensor object representing that variable.
 Parameters


v
: The variable to retrieve.

 bool
isConstant
(VariableRef v) const¶ 
Check whether a variable reference refers represents a constant.
When Graph::addConstant() is called a variable is created to represent that constant. This call checks whether a variable was created by that method or by Graph::addVariable().
 Return

True if and only if the variable refers to a constant.
 Parameters


v
: The variable to examine.

 std::vector<std::vector<Interval>>
getSortedContiguousRegions
(const Tensor &t, ArrayRef<Interval> regions, bool removeAliasedIntervals = false, std::vector<std::size_t> *aliases = nullptr) const¶ 
Get a list of sequences of intervals over a tensor such that each sequence represents a contiguous region of memory.
 Return

A list of sequences of intervals. The intervals will cover the same elements of the tensor as provided as input.
 Parameters


t
: The tensor to get intervals over. 
regions
: A list of intervals representing the elements to sort into memory contiguous sequences. 
removeAliasedIntervals
: If true, remove intervals which alias others in the given regions from the result. 
aliases
: Optional list of indices for each region in the returned intervals where an index is always the same for a region representing the same underlying elements in memory. If this is nullptr, then no aliases will be returned.

 void
reorderToSimplify
(Tensor *t, ArrayRef<Tensor *> ts) const¶ 
Reorder a set of tensors in order to simplify the view on data.
This function will update ‘t’ to be a (simpler) reordered view on the same data. The same reordering will be applied to all elements of ‘ts’. The reordering will be the same for all tensors so orderinvariant or elementwise operations on ‘t’ and ‘ts’ can still be performed.
The main purpose of this function is to provide a way to implement more efficient graph construction of elementwise or orderinvariant operations.
After execution t will consist of the minimum number of possible contiguous regions.
All the tensors provided to this function must be of rank 1 (i.e flattened tensors) and have the same number of elements.
 void
serializeTensors
(std::ostream &out, ArrayRef<Tensor> tensors, SerializationFormat format) const¶ 
Serialize a set of tensors to JSON or CapnProto.
The tensors must all be from this graph or an exception is thrown. The information saved is:

The type, shape and expression of the tensors.

The type and number of elements of any variables used.
This is intended to be used for debugging, testing and visualisation.
 Parameters


out
: Stream to write to. 
tensors
: A set of tensors to serialize. 
format
: Serialize in JSON or CapnProto format. JSON is pretty printed.

 Exceptions


poplar_error
: if any tensor is not from this graph. CapnProto may also throw an exception if serialization fails.


 std::vector<Tensor>
deserializeTensors
(std::istream &in, SerializationFormat format)¶ 
Deserialize a set of tensors from a CapnProto message.
JSON deserialization is not currently supported and an exception will be thrown if format is SerializationFormat::JSON.
This will recreate the tensors in this graph. It throws an exception on failure (for example, if the tensor type does not match the variable types). Whenever a variable is used by a tensor a new variable is added to the graph.
The layout of the tensors and variables should be the same as when they were serialized.
This function is primarily intended for testing and benchmarks. You should not use it as a general method of creating tensors.
 Return

The deserialized set of tensors.
 Parameters


in
: A stream from which serialised tensor data can be read. 
format
: Must be SerializationFormat::Binary

 Graph
createVirtualGraph
(unsigned numTilesPerIPU)¶ 
Create a “virtual” graph working over a subset of the target’s tile.
This method returns a graph object that references the same state as this graph but has a virtual target than only uses a subset of the target’s tiles.
If the getTarget() method is called on the new graph it will return a target with the new number of tiles.
 Return

The virtual graph object
 Parameters


numTilesPerIPU
: The number of tiles per IPU for the new graph to work over.

 Graph
createVirtualGraph
(unsigned lowerTile, unsigned upperTile)¶ 
Create a “virtual” graph working over a subset of the target’s tiles.
This method returns a graph object that references the same state as this graph but has a virtual target than only uses a subset of the target’s tiles.
This variant of the method takes a tile range for the new virtual graph to work over. This is the range [lowerTile:upperTile). This tile range must be contained within a single IPU.
If the getTarget() method is called on the new graph it will return a target with the new number of tiles.
 Return

The virtual graph object
 Parameters


lowerTile
: The starting tile of the tile range for the virtual graph to work over. 
upperTile
: The upper bound of the tile range for the virtual graph to work over. This is a noninclusive upper bound.

 Graph
createVirtualGraph
(const std::vector<unsigned> &perIpuTiles)¶ 
Create a “virtual” graph working over a subset of the target’s tiles.
This method returns a graph object that references the same state as this graph but has a virtual target than only uses a subset of the target’s tiles.
This variant of the method takes the set of tiles in each IPU that should be included in the new graph.
If the getTarget() method is called on the new graph it will return a target with the new number of tiles.
 Return

The virtual graph object
 Parameters


perIpuTiles
: The tiles to include in the graph. Tiles are specified by their index in the IPU. Each tile index must be unique and less than the number of tiles per IPU.

 Graph
createReplicatedGraph
(unsigned replicationFactor)¶ 
Create a replicated graph.
The replicated graph is a view on
replicationFactor
virtual subgraphs. Operations on the replicated graph are implicitly applied to each virtual subgraph, for example adding a variable to the replicated graph implicitly creates a variable in all of the underlying subgraphs.The replication factor must divide the number of tiles in the graph. If n is the number of tiles in this graph the first subgraph contains tiles [0, n / replicationFactor), the second subgraph contains tiles [n / replicationFactor, 2n / replicationFactor) and so on.
 Graph
getTopLevelGraph
()¶ 
Return the top level graph.
The createVirtualGraph() and createReplicatedGraph() methods can be used to create graph objects that are views on an underlying graph. If this is a virtual or replicated graph then this function returns the top level underlying graph, otherwise it returns the current graph.
 unsigned
getReplicationFactor
() const¶ 
Return the replication factor of the graph.
 Tensor
addReplicationIndexConstant
()¶ 
Add a constant that is initialized with the replication index.
 Tensor
getNonReplicatedTensor
(const Tensor &t) const¶ 
Given a replicated tensor return the underlying tensors in this graph that the replicated tensor is a placeholder for.
The tensor returned by this function has an extra outer dimension equal to the replication factor of the tensor in this graph and it is formed by concatenating the underlying tensors for each replicated subgraph in this dimension.
This function can only be used with replicated graphs created by the createReplicatedGraph function, not when the Graph is constructed.
 void
serialize
(std::ostream &out, SerializationFormat format) const¶ 
Serialize a graph to JSON or binary (CapnProto) format.
This is intended to be used for debugging, testing and visualisation.
 Parameters


out
: Stream to write to. 
format
: Serialize in JSON or CapnProto format. JSON is pretty printed.

 Function
addFunction
(const program::Program &program)¶ 
Add a function to the graph.
A function is a partial control program that can be reused. By registering a repeated program as a function and calling it, less control code is generated than repeating the sequence.
 Return

The Function object that can be used by a Call program.
 Parameters


program
: The control program to register as a callable function

 unsigned
convertVirtualTileToPhysicalTile
(unsigned virtualTileId) const¶ 
Convert Virtual Tile ID into Physical Tile ID.
A function provides conversion interface required by the Graphcore communication library to know what exchange block context a tile is associated with.
 Return

Physical Tile ID
 Parameters


Virtual
: Tile ID

 unsigned
convertPhysicalTileToVirtualTile
(unsigned physicalTileId) const¶ 
Convert Physical Tile ID to Virtual Tile ID.
This function provides a conversion interface required by the Graphcore communication library to know what exchange block context a tile is associated with.
 Return

Virtual Tile ID
 Parameters


Physical
: Tile ID

 unsigned
convertPhysicalTileToVirtualTile
(unsigned ipuId, unsigned physicalTileId) const¶ 
Convert Physical Tile ID to Virtual Tile ID.
A function returns Virtual Tile ID based on a parameters pair of IPU and and Physical Tile ID. This conversion interface is required by the Graphcore communication library to know what exchange block context a tile is associated with.
 Return

Virtual Tile ID
 Parameters


IPU
: ID 
Physical
: Tile ID

 core::GraphBuilder &
getImpl
() const¶
Private Functions
 void
setInitialValue
(FieldRef field, const void *val, const TypeTraits&)¶
 template<typename
T
>
voidsetInitCallback
(FieldRef field, LateInitCallback<T> callback, const TypeTraits&)¶
 void
setInitialValue
(const Tensor &t, const void *val, const TypeTraits&)¶
 void
connect
(FieldRef field, void *val, const TypeTraits&)¶
 class
ConnectionDesc
¶ 
Public Functions
 template<typename
T
>ConnectionDesc
(StringRef field, T v, typename std::enable_if<TypeTraits::isSimpleType<T>()>::type * = nullptr)¶
Private Members
 TypeTraits
traits
¶
Friends
 friend
poplar::Graph
 template<typename
 class
poplar/GraphElements.hpp¶
 namespace
poplar

Typedefs
 typedef unsigned
vertex_id
¶ 
Vertex id.
The integral type of unique identifiers to vertices with a graph.
 class
ComputeSet
¶  #include <GraphElements.hpp>
A reference to a compute set within a graph.
This type provides a way to address compute sets within a graph.
Private Members
 unsigned
computeset_id
¶
 unsigned
 class
FieldRef
¶  #include <GraphElements.hpp>
A reference to a field within a vertex instance.
This type provides a way to address fields (inputs or internal state) within a vertex. FieldRef’s are normally obtained using
VertexRef::operator[](StringRef fieldName)
, for example:VertexRef vertex = graph.addVertex(...); FieldRef input = vertex["input"]; graph.connect(input, ...);
A FieldRef can also be indexed, for example:
FieldRef input_5 = vertex["input"][5];
This is used when a field is a list of regions, for example a
Vector<Input<Vector<...>>> or an Input<VectorList<...>>
.Public Functions
FieldRef
()¶
 FieldRef
operator[]
(std::size_t index) const¶ 
Access an element of a vector field.
Subscript a vector field to access the element at position
index
. Return

A reference to the field.
 Parameters


index
: The subscript of the field

 bool
isIndexed
() const¶
Private Functions
FieldRef
(VertexRef vertex, StringRef fieldName)¶
FieldRef constructor from vertex id and field name.
Construct a FieldRef out of a vertex id and the name of the field.
Friends
 friend
poplar::VertexRef
 class
Function
¶  #include <GraphElements.hpp>
A reference to a function stored within a graph.
Private Members
 unsigned
function_id
¶
 unsigned
 class
VertexRef
¶  #include <GraphElements.hpp>
A reference to a vertex within a graph.
This type provides a way to address vertices within a graph.
Public Functions
VertexRef
()¶
Private Functions
VertexRef
(const core::GraphBuilder *graph, unsigned id)¶
Construct a vertex reference from an ID.
 Return

A reference to the vertex.
 Parameters


graph
: The graph containing the vertex. \ param id The id of the vertex.

Friends
 friend
poplar::core::GraphBuilder
 friend
poplar::Graph
 friend
poplar::FieldRef
 typedef unsigned
poplar/Tensor.hpp¶
 namespace
poplar

Functions
 Tensor
concat
(ArrayRef<Tensor> ts, unsigned dimension = 0)¶ 
Concatenate several tensors.
The tensors are concatenated along the specified dimension.
 Return

The result of the concatenation
 Parameters


ts
: The tensors to concatenate 
dimension
: The number of the dimension to concatenate across

 Tensor
concat
(const Tensor &first, const Tensor &second, unsigned dimension = 0)¶ 
Concatenate two tensors.
The tensors are concatenated along the specified dimension.
 Return

The result of the concatenation
 Parameters


first
: The first tensor to concatenate 
second
: The second tensor to concatenate 
dimension
: The number of the dimension to concatenate across

 Tensor
append
(const Tensor &first, const Tensor &second, unsigned dimension)¶ 
Append a tensor as an element to another tensor.
 Return

The extended tensor
 Parameters


first
: The tensor to append to 
second
: The tensor to add as an element in the specified dimension 
dimension
: The number of the dimension to append to

 class
Tensor
¶  #include <Tensor.hpp>
A reference to a subset of tensor elements.
Public Functions
Tensor
()¶
~Tensor
()¶
 Type
elementType
() const¶ 
Get the element type information for this tensor.
 Return

The element type.
 Tensor
operator[]
(std::size_t i) const &¶ 
Get the subtensor indexed by i in the first dimension of the tensor.
 Parameters


i
: The index into the first dimension of the tensor.

 Tensor
slice
(std::size_t begin, std::size_t end, unsigned dimension) const &¶ 
Get the subtensor given by a specific range [begin, end) in one dimension of the tensor.
 Parameters


begin
: The first element of the range 
end
: The upper bound to the range (the last element + 1) 
dimension
: The dimension to slice in

 Tensor
slice
(std::size_t begin, std::size_t end) const¶ 
Get the subtensor given by a specific range [begin, end) in the first dimension of the tensor.
 Parameters


begin
: The first element of the range 
end
: The upper bound to the range (the last element + 1)

 Tensor
slice
(const Interval ®ion, unsigned dimension = 0) const¶ 
Get the subtensor given by a specific range [begin, end) in one dimension of the tensor.
 Parameters


region
: The region to slice 
dimension
: The dimension to slice in

 Tensor
slice
(ArrayRef<std::size_t> begin, ArrayRef<std::size_t> end) const¶ 
Get the subtensor given by slicing the tensor in multiple dimensions, starting at dimension 0.
Each pair begin[i], end[i] specifies that the tensor is sliced in dimension i by the range [begin[i], end[i]). The rank of the returned tensor is the same as the input tensor.
 Parameters


begin
: The lower bounds of the ranges used to slice the tensor 
end
: The upper bounds of the ranges used to slice the tensor

 std::vector<Tensor>
slices
(ArrayRef<Interval> intervals, unsigned dimension = 0) const¶ 
Get a vector of slices.
 Return

A vector of slices where each slice is obtained by slicing this tensor between the two points in the given interval list.
 Parameters


intervals
: A list of intervals. 
dimension
: The dimension to slice in

 std::vector<Tensor>
slices
(const std::vector<std::vector<Interval>> &intervals, unsigned dimension = 0) const¶ 
Get a vector of slices.
 Return

A vector of tensors where each tensor is the concatenation of the sequence of several slices, each slice being this tensor between the two point in the corresponding interval in the sequences given as input.
 Parameters


intervals
: A list of sequences of intervals. 
dimension
: The dimension to slice in

 Tensor
index
(ArrayRef<std::size_t> indices) const¶ 
Get the subtensor indexed by the specified indices.
This is equivalent to repeatedly applying operator[] for each index in the vector of indices.
 Return

The subtensor indexed by the indices.
 Parameters


indices
: The indices used to index into the tensor.

 Tensor
flatten
() const¶ 
Flatten the tensor.
 Return

A tensor consisting of all elements of the original tensor but with a single dimension.
 Tensor
flatten
(unsigned dimBegin, unsigned dimEnd) const¶ 
Flatten the a subset of the dimensions of a tensor.
 Return

A tensor consisting of all elements of the original tensor with the specified dimension range flattened into one dimension.
 Parameters


dimBegin
: The first dimension to flatten 
dimEnd
: One past the last dimension to flatten.

 Tensor
reshape
(ArrayRef<std::size_t> shape) const¶ 
Reshape the tensor.
The reshaping operation changes the shape of the tensor but cannot change the total number of elements.
 Return

A tensor consisting of all elements of the original but with new dimensions.
 Parameters


shape
: The new shape of the tensor.

 Tensor
dimShuffle
(ArrayRef<unsigned> permutation) const¶ 
Permute the dimensions of a tensor.
The dimShuffle operation reorders the tensor to a permutation of its dimensions. It can be seen as the generalized form of a matrix transpose.
Note that this operation does not create a copy of the tensor but returns a reordered view on this tensor’s data.
 Return

The shuffled tensor
 Parameters


permutation
: The permutation vector specifies a mapping from the output dimension to the input dimension. For example the permutation of {2, 0, 1} specifies that element element [a][b][c] in the original tensor is remapped to element [c][a][b] in the new tensor.

 Tensor
dimShufflePartial
(ArrayRef<unsigned> source, ArrayRef<unsigned> destination) const¶ 
Permute some of a tensor’s dimensions.
dimShufflePartial reorders the tensors dimensions. The unspecified dimensions stay in the same relative order.
Note that this operation does not create a copy of the tensor but returns a reordered view on this tensor’s data.
 Return

The shuffled tensor.
 Parameters


source
: The dimensions to move. 
destination
: The index at which to move each source dimension.

 Tensor
dimRoll
(unsigned dimIdx, unsigned newIdx = 0) const¶ 
Roll a specified dimension to the specified dimension.
The other dimensions remain in the same relative order
Note that this operation does not create a copy of the tensor but returns a reordered view on this tensor’s data.
 Return

The shuffled .
 Parameters


dimIdx
: The dimension to move. 
newIdx
: Its new location, default 0.

 Tensor
reshapePartial
(unsigned beginIndex, unsigned endIndex, ArrayRef<std::size_t> newDims) const¶ 
Reshape a range of dimensions of a tensor.
reshapePartial reshapes the input tensor such that the total number of elements of the resultant tensor is the same as the input tensor.
Note that this operation does not create a copy of the tensor but returns a reshaped view on the input tensor’s data.
The following conditions define the valid use of this function:
1) beginIndex == endIndex
beginIndex and endIndex must each lie in the closed interval [0, rank()]. Singleton dimensions are added before beginIndex. The number of dimensions added is equal to the length of the newDims vector. For example:
reshapePartial(0, {1, 1})
2) size(newDims) == 0 and beginIndex != endIndex
beginIndex must lie in the half closed interval [0, rank()) endIndex must lie in the half closed interval (0, rank()] The product of vector newDims must be 1. For example:
reshapePartial(1, 3, {})
3) size(newDims) != 0 and beginIndex != endIndex
beginIndex must lie in the half closed interval [0, rank()) endIndex must lie in the half close interval (0, rank()] The product of vector newDims must be equal to the product of the number of elements in the interval [beginIndex, endIndex)
The input dimensions [0, beginIndex) and [endIndex, rank()) are prepended and appended at the end of the tensor respectively. For example:
reshapePartial(1, 3, {10, 20, 30}) reshapePartial(1, 3, {10})
 Return

Reshaped view of tensor
 Parameters


beginIndex
: Index of the dimension from which reshape starts 
endIndex
: Index of the first dimension after reshape ends 
newDims
: The new dimensions of the partial tensor

 Tensor
expand
(ArrayRef<std::size_t> indices) const¶ 
Expand tensor by adding singleton dimensions at specified indices of tensor.
The rank is expanded by the size of dimensions to be added. To add more than one dimension at a given position, the same index shall be repeated.
 Return

A view of expanded tensor
 Parameters


indices
: Dimension indices before which the singleton dimensions are added

 Tensor
squeeze
(ArrayRef<std::size_t> indices) const¶ 
Reduce dimension of tensor by removing singleton dimensions at specified indices of tensor.
 Return

A view of squeezed tensor
 Parameters


indices
: Indices of singleton dimensions which are removed

 Tensor
subSample
(unsigned stride, unsigned dimension) const¶ 
Subsample the tensor.
Subsample this tensor by selecting every strideth element of the tensor in a specified dimension
 Return

The subsampled tensor
 Parameters


stride
: The size of the stride 
dimension
: The dimension to subsample in

 Tensor
broadcast
(unsigned N, unsigned dimension) const¶ 
Broadcast/repeat the tensor along a specified dimension.
Create a view with this tensor repeated N times along a specified dimension.
 Return

The broadcast tensor.
 Parameters


N
: The number of times to repeat. 
dimension
: The dimension to broadcast in.

 Tensor
reinterpret
(const Type &type) const¶ 
Reinterpret the tensor as a new type.
The new type must be the same size as the old type. See elementType() for a list of valid types and their sizes.
 Return

A tensor with the same shape and referencing the same data but of the new type.
 Parameters


type
: The type to reinterpret to

 Tensor
reverse
(unsigned dimensions) const¶ 
reverse this tensor along a specified dimension.
 Return

The reversed tensor.
 Parameters


dimension
: The dimension to reverse.

 std::size_t
dim
(unsigned i) const¶ 
Get a dimension of the tensor.
 Parameters


i
: The index of the dimension to get.

 std::vector<std::size_t>
shape
() const¶ 
Get the shape of the tensor.
 Return

A vector of all the dimensions of the tensor.
 unsigned
rank
() const¶ 
Get the rank of the tensor.
 Return

The number of dimensions a tensor has.
 bool
isContiguous
() const¶ 
Get whether the tensor is contiguous.
 bool
containsAliases
() const¶ 
Get whether the tensor contains an alias to the same storage location.
 Return

True if the tensor contains an alias to the same storage location.
 bool
containsConstant
() const¶ 
Get whether the tensor contains any constant tensors.
 Return

True if the tensor contains any constant tensors.
 bool
isParallelWriteable
() const¶ 
Get whether the elements of this tensor can be written in parallel.
This is equivalent to !(containsAliases()  containsConstant()).
 Return

True if the tensor can be written in parallel.
 const std::vector<Interval>
getContiguousRegions
() const¶ 
Get the contiguous regions of a tensor.
 Return

A vector of intervals in order representing regions of the tensor that are contiguous in the tensors storage ordering.
 const std::vector<VariableInterval>
getVarRegions
() const¶ 
Get the contiguous regions of a tensor with reference to the variables allocated in the graph.
 Return

A vector of variable intervals (variable id, interval pairs) representing the regions of the tensor.
 template<typename
T
>
boolgetConstantValue
(T *val) const¶ 
Read a single element of data from a tensor if it is a constant.
 Return

True if tensor is constant and data is read
 Parameters


val
: Buffer to which tensor data is copied to

 bool
intersectsWith
(const Tensor &other) const¶ 
Return whether this tensor intersects with another tensor.
 Return

True if this tensor intersects with the other tensor.
 Parameters


other
: The tensor to compare with.

 std::ostream &
output
(std::ostream &os) const¶ 
Display the expression representing the tensor on a stream.
 Return

The ostream written to
 Parameters


os
: The ostream to output to

 std::ostream &
outputRegions
(std::ostream &os) const¶ 
Display the regions of the tensor on a stream.
 Return

The ostream written to
 Parameters


os
: The ostream to output to

 void
dump
() const¶ 
Display the expression representing the tensor.
 void
dumpRegions
() const¶ 
Display the regions of the tensor.
 core::Tensor &
getImpl
() const¶
 bool
valid
() const¶
Private Functions
 bool
getConstantData
(void *dst, const TypeTraits &traits) const¶
 Tensor
poplar/TensorCloneMethod.hpp¶
 namespace
poplar

Enums
 enum
TensorCloneMethod
¶ 
Define behaviour when a Tensor is cloned.
 See
Values:
PRESERVE_ORDER_AND_ALIASES
¶
Preserve the ordering and aliasing within the original tensor reference .
CREATE_NEW_ORDER
¶
Create a new tensor with natural ordering based on the dimensions of the cloned tensor (in the same way as addTensor).
PRESERVE_ORDER_UNLESS_ALIASES
¶
Preserve the ordering of the original tensor unless it contains aliases.
In the case of aliases, create a new tensor ordering and duplicate the aliased elements.
 enum
poplar/Type.hpp¶
Defines
POPLAR_DECLARE_EQUIV_TYPE
(T1, T2)¶
 namespace
poplar

Variables
 template<typename
T
>
structequivalent_device_type
¶  #include <Type.hpp>
Template structure to relate a host type to a device type.
This structure is specialized to allow a program to relate a host type to a corresponding device type. For example::
poplar::Type t = equivalent_device_type<int>().value;
 class
Type
¶  #include <Type.hpp>
Class representing device data types.
The following types are not supported on the IPU:

LONG

UNSIGNED_LONG

LONGLONG

UNSIGNED_LONGLONG

DOUBLE
For other types, the sizes on the IPU are:

BOOL: 1 byte

CHAR: 1 byte (signed)

SIGNED_CHAR: 1 byte

UNSIGNED_CHAR: 1 byte

SHORT: 2 bytes

SIGNED_SHORT: 2 bytes

UNSIGNED_SHORT: 2 bytes

INT: 4 bytes

SIGNED_INT: 4 bytes

SIGNED: 4 bytes

UNSIGNED_INT: 4 bytes

UNSIGNED: 4 bytes

HALF: 2 bytes

FLOAT: 4 bytes

 template<typename
poplar/VariableMappingMethod.hpp¶
 namespace
poplar

Enums
 enum
VariableMappingMethod
¶ 
When variables are added to the graph, a tile mapping can be created.
This class enumerates the method for creating that mapping.
Values:
NONE
¶
No mapping is created.
The tile mapping will be set later via Graph::setTileMapping.
LINEAR
¶
The variable will be spread evenly across the tiles with the element ordering matching the tile number ordering.
The tile mapping can also be overridden later via Graph::setTileMapping.
 enum
poplar/VariableRef.hpp¶
 template<>
structhash
<poplar::VariableRef>¶ 
Public Functions
 size_t
operator()
(const poplar::VariableRef &v) const¶
 size_t
 namespace
poplar

Functions
 bool
operator==
(const VariableInterval &a, const VariableInterval &b)¶
 bool
operator<
(const VariableInterval &a, const VariableInterval &b)¶
 struct
VariableInterval
¶  #include <VariableRef.hpp>
Type representing a segment of a particular variable.
Public Functions
VariableInterval
(VariableRef var, Interval interval)¶
VariableInterval
()¶
VariableInterval
(const VariableInterval &other)¶
VariableInterval
(VariableInterval &&other)¶
 VariableInterval &
operator=
(const VariableInterval &other)¶
 VariableInterval &
operator=
(VariableInterval &&other)¶
 class
VariableRef
¶  #include <VariableRef.hpp>
Type representing a reference to a variable in a graph.
Public Functions
VariableRef
(unsigned id, unsigned replicationFactor)¶
VariableRef
()¶
VariableRef
(const VariableRef &other)¶
VariableRef
(VariableRef &&other)¶
 VariableRef &
operator=
(const VariableRef &other)¶
 VariableRef &
operator=
(VariableRef &&other)¶
 bool
 namespace
std
¶ 
 template<>
structhash
<poplar::VariableRef> 
Public Functions
 size_t
operator()
(const poplar::VariableRef &v) const
 size_t
 template<>
poplar/VertexIntrospector.hpp¶
 namespace
poplar

 class
FieldData
¶  #include <VertexIntrospector.hpp>
Information about a vertex field, including its size and its initial value if set.
This is used when calculating cycle estimates.
Vertex fields can be scalar, 1D or 2D. For example:

Scalar:
float
,Input<float>
. 
1D:
Vector<float>
,Input<Vector<float>>

2D:
Input<VectorList<float>>
,Vector<Input<Vector<float>>>
Their sizes can always be returned, and the initial values can be returned for nonedge fields (
float
,Vector<float>
) and edge fields (Input
etc.) that are connected to constants.Note that 2D fields are vectors of vectors, in other words they are jagged 2D arrays.
Public Functions
 virtual
~FieldData
()¶
 unsigned
rank
() const¶ 
Return the rank of the field: 0 for scalar fields, 1 for 1D and 2 for 2D.
 size_t
size
() const¶ 
Return the size of the field.
For scalar fields it returns 1, for 1D fields it returns the size of the vector, and for 2D fields it returns the number of subvectors.
 size_t
getSizeAtIndex
(size_t i) const¶ 
For 2D fields, return the size of the subvector.
Throws an error if called on non2D fields.
 Parameters


i
: Index of subvector to return size of

 SizeT
operator[]
(size_t i) const¶ 
Instead of field.getSizeAtIndex(i) you can alternatively use field[i].size().
 template<typename
T
>
TgetInitialValue
(const Target &target) const¶ 
Get the inital value for a scalar field.
T should be a scalar type. Throws an error if this is not a scalar field.
Private Functions
 template<typename
T
>
voidgetInitialValuesOverload
(const Target &target, std::vector<T> &result) const¶
 template<typename
T
>
voidgetInitialValuesOverload
(const Target &target, std::vector<std::vector<T>> &result) const¶
 void
getInitialValues
(const Target &target, void *dst, const TypeTraits &traits, size_t index = std::numeric_limits<size_t>::max()) const¶
 struct
SizeT
¶

 class
VertexIntrospector
¶  #include <VertexIntrospector.hpp>
Available to cycle estimators to inspect the shape and initial values of a vertex’s fields.
Public Functions
 ComputeSet
getComputeSet
() const¶ 
Return the compute set that this vertex is in.
 const core::VertexIntrospector &
getImpl
() const¶
 ComputeSet
 class
Control program classes¶
poplar/Program.hpp¶
 namespace
poplar

 namespace
program
¶ 
Functions
 class
Call
: public poplar::program::Program¶  #include <Program.hpp>
A program to perform a function call to a previously stored program.
Public Functions
Call
(Function f)¶
Call the function.
 Parameters


f
: A program that has been added to the graph using Graph::addFunction.

 class
Copy
: public poplar::program::Program¶  #include <Program.hpp>
A program that copies data.
Public Functions
Copy
(Tensor src, Tensor dst, bool dontOutline = false)¶
Construct a program to copy data from one tensor to another.
This constructor creates a program that will copy data from the ‘src’ tensor to the ‘dst’ tensor.
 Parameters


src
: The tensor to copy from. 
dst
: The tensor to copy to. 
dontOutline
: Do not outline this copy as a function call. Default is false (i.e. outlined).

Copy
(const DataStream &stream, Tensor dst, bool rearrangeOnHost = false, Tensor offset = Tensor(), size_t repeats = 1)¶
Construct a program to copy from a data stream to a tensor.
 Parameters


stream
: The stream to copy from. 
dst
: The tensor to copy to. 
rearrangeOnHost
: Set to true to save some memory by offloading some work to the host. 
offset
: The index of the tensor multiple desired 
repeats
: The number of tensor multiples

Copy
(Tensor src, const DataStream &stream, bool rearrangeOnHost = false, Tensor offset = Tensor(), size_t repeats = 1)¶
Construct a program to copy a Tensor to a data stream.
 Parameters


src
: The tensor to copy from. 
stream
: The stream to copy to 
rearrangeOnHost
: Set to true to save some memory by offloading some work to the host. 
offset
: The index of the tensor multiple desired 
repeats
: The number of tensor multiples

Copy
(const RemoteBuffer &buffer, Tensor dst)¶
Construct a program to copy a remote buffer to a tensor.
 Parameters


buffer
: The remote buffer to copy from 
dst
: The Tensor to copy to 
offset
: The index of the tensor multiple desired

Copy
(const RemoteBuffer &buffer, Tensor dst, Tensor offset)¶
Copy
(Tensor src, const RemoteBuffer &buffer)¶
Construct a program to copy a tensor to a remote buffer.
 Parameters


src
: The tensor to copy from 
buffer
: The remote buffer buffer to copy to 
offset
: The index of the tensor multiple desired

Copy
(Tensor src, const RemoteBuffer &buffer, Tensor offset)¶
 class
CrossReplicaCopy
: public poplar::program::Program¶  #include <Program.hpp>
A program that copies tensors between replicated subgraphs.
Public Functions
CrossReplicaCopy
(Tensor src, Tensor dst, std::map<unsigned, unsigned> replicaMap)¶
Constructor to create a program to copy a tensor to the equivalent tensor in a different replica subgraph.
When the replicated graphs are created, this will create a Copy program in each replica. Each replica sends to exactly one other replica and receives from exactly one other replica. A replica may not copy to itself.
 Parameters


src
: Replicated tensor to copy from. 
dst
: Replicated tensor to copy to. 
replicaMap
: Each key in this map specifies the subgraph or replica that contains the source tensor. The corresponding value is the replica that contains the destination tensor.The size of the replica map is equal to the graph replication factor.
Each replica must be represented once as a key (source) and once as a value (destination).

 class
Execute
: public poplar::program::Program¶  #include <Program.hpp>
Program that executes a compute set in the graph.
Public Functions
Execute
(ComputeSet cs)¶
Construct a graph execution program.
 Parameters


cs
: The compute set to execute.

Execute
(ComputeSet cs, Tensor t)¶
Construct a graph execution program and write the exit status to a scalar tensor.
The exit status is the logical and of the return values of the vertices in the compute set.
 Parameters


cs
: The compute set to execute. 
t
: The tensor to write the exit status to.

 class
If
: public poplar::program::Program¶  #include <Program.hpp>
A program that runs one of two programs depending on the value of a scalar tensor.
Public Functions
If
(Tensor predicate, const Program &trueBody, const Program &falseBody)¶
A program that executes the trueBody or falseBody depending on the value of the predicate.
You can pass an empty Sequence to either trueBody or falseBody if you don’t want either branch to do anything.
 Parameters


predicate
: The scalar tensor that determines which branch to execute 
trueBody
: This program is run if the predicate is true. 
falseBody
: This program is run if the predicate is false.

 class
PrintTensor
: public poplar::program::Program¶ 
Public Functions
PrintTensor
(Tensor t)¶
Print the contents of a Tensor.
You can send the output to a different stream by using the Engine::setPrintTensorStream function.
 Parameters


t
: The Tensor to print

 class
Program
¶  #include <Program.hpp>
This class represents a control program that executes operations on the graph.
The class should not be explicitly constructed but one of its subclasses should be constructed instead.
Subclassed by poplar::program::Call, poplar::program::Copy, poplar::program::CrossReplicaCopy, poplar::program::Execute, poplar::program::If, poplar::program::PrintTensor, poplar::program::Repeat, poplar::program::RepeatWhileFalse, poplar::program::RepeatWhileTrue, poplar::program::Sequence, poplar::program::Switch, poplar::program::Sync, poplar::program::WriteUndef
 class
Repeat
: public poplar::program::Program¶  #include <Program.hpp>
A program that repeatedly executes for a fixed number of iterations.
 class
RepeatWhileFalse
: public poplar::program::Program¶  #include <Program.hpp>
A program that evaluates the condition program, and if the predicate tensor is true it exits the loop.
If predicate tensor is false it evaluates the body program, and then loops to reevaluate the condition program. This is like a C while statement with an inverted condition.
Public Functions
RepeatWhileFalse
(const Program &cond, Tensor predicate, const Program &body)¶
Construct a repeat while false program.
 Parameters


cond
: The program evaluated before the body is evaluated 
predicate
: The scalar tensor that determines whether to execute the body 
body
: The body to execute when the predicate is false

 class
RepeatWhileTrue
: public poplar::program::Program¶  #include <Program.hpp>
A program that evaluates the condition program, and if the predicate tensor is false it exits the loop.
If predicate tensor is true it evaluates the body program, and then loops to reevaluate the condition program. This is like a C while statement.
Public Functions
RepeatWhileTrue
(const Program &cond, Tensor predicate, const Program &body)¶
Construct a repeat while true program.
 Parameters


cond
: The program evaluated before the body is evaluated 
predicate
: The scalar tensor that determines whether to execute the body 
body
: The body to execute when the predicate is true

 class
Sequence
: public poplar::program::Program¶  #include <Program.hpp>
Program that executes a sequence of programs.
Public Functions
Sequence
()¶
Construct a sequence program.
 class
Switch
: public poplar::program::Program¶  #include <Program.hpp>
A program that runs one of many programs depending on the value of a tensor.
The controlling tensor must be a scalar of type INT or UNSIGNED_INT. A switch contains of a number of switch cases, each with a case value and a case body and a default case. The case values must be unique. If the value of the controlling tensor matches the case value of a case the corresponding case body is run, otherwise the default case is run.
Public Functions
Switch
(Tensor control, const std::vector<std::pair<std::int32_t, Program>> &cases)¶
Construct a switch with the specified set of cases and an empty default case.
 Parameters


control
: The controlling tensor 
cases
: The cases of the switch

Switch
(Tensor control, const std::vector<std::pair<std::int32_t, Program>> &cases, const Program &defaultCaseBody)¶
Construct a switch with the specified set of cases and default case.
 Parameters


control
: The controlling tensor 
cases
: The cases of the switch 
defaultCaseBody
: The body of the default case

Switch
(Tensor control)¶
Construct a switch with no cases and an empty default case.
The add() method can be used to add cases after the switch is constructed.
 Parameters


control
: The controlling tensor

 class
Sync
: public poplar::program::Program¶  #include <Program.hpp>
A program to synchronise at a certain granularity dictated by the SyncType.
 class
WriteUndef
: public poplar::program::Program¶  #include <Program.hpp>
A program to mark a tensor as containing an undefined value.
This can be used to improve the liveness analysis of tensors and save memory in some situations.
Poplar does liveness analysis using the standard algorithm except that Poplar’s variables are not scalar values; they are arrays. In the standard analysis a variable is “killed” when it is written to with a new value. This means that it is dead immediately before that point because its value there can never be read.
int a = 1; // a is dead here because its current value (1) can never be read. a = 2; // a is killed here, which makes it dead on the line above.
In Poplar a variable is killed when all of its elements are written in the same compute set. Consider the pseudocode:
var = graph.addVariable(FLOAT, {2}, ...); seq.add(Execute( var[0] = 1, var[1] = 2 )); // var is dead here (it is killed on the line below) because none of its // element values (1, 2) can ever be read. seq.add(Execute( var[0] = 3, var[1] = 4 ));
If only some of the elements are written then the entire variable is still live before the write because we may still need the value of the elements that were not written to.
seq.add(Execute( var[0] = 1, var[1] = 2 )); // var is alive here because the value 2 might be read later. seq.add(Execute( var[0] = 3 ));
var is still alive because no compute set writes to every element. If the entire variable is overwritten but in separate compute sets, then it will still be considered to be live because Poplar does not track the liveness of each variable element  only the entire variable.
seq.add(Execute( var[0] = 1, var[1] = 2 )); // var is alive here even though 1 and 2 can never be read. seq.add(Execute( var[0] = 3 )); seq.add(Execute( var[1] = 4 ));
This means var is alive more than necessary which may lead to increased memory use. One solution is for Poplar to track the liveness of every variable element separately, but that would be prohibitively expensive.
Instead, this program provides a way to manually mark a tensor as being dead by writing an undefined value to it. Changing the above code to the following results in the correct liveness.
seq.add(Execute( var[0] = 1, var[1] = 2 )); // Manually kill var because we know  even if Poplar does not  that // it is about to be completely overwritten. seq.add(WriteUndef(var)); seq.add(Execute( var[0] = 3 )); seq.add(Execute( var[1] = 4 ));
For more information about liveness analysis see https://en.wikipedia.org/wiki/Live_variable_analysis and https://www.cl.cam.ac.uk/teaching/2006/OptComp/slides/lecture03.pdf
 Parameters


t
: The tensor to mark as undefined.

 class
 namespace
Device management¶
poplar/TargetType.hpp¶
 namespace
poplar

Enums
Functions
 std::string
toString
(TargetType t)¶ 
Convert the target type to a string.
Throws an exception if an undefined type is passed, e.g. static_cast<TargetType>(100).
 std::string
poplar/Target.hpp¶
 namespace
poplar

Functions
 void
copyDeviceHalfToFloat
(const Target &target, const void *src, float *dst, std::size_t numElements)¶ 
Convert device halfprecision values to floats.
 Parameters


target
: Target that the halfprecision data is to be copied from. 
src
: Pointer to the start of the halfprecision data. 
dst
: Pointer to the float data to write. 
numElements
: Number of items to convert.

 void
copyFloatToDeviceHalf
(const Target &target, const float *src, void *dst, std::size_t numElements)¶ 
Convert float values to device halfprecision values.
 Parameters


target
: Target that the halfprecision data is to be copied to. 
src
: Pointer to the float data to read. 
dst
: Pointer to the halfprecision data to write. 
numElements
: Number of items to convert.

 void
copyDeviceHalfToDouble
(const Target &target, const void *src, double *dst, std::size_t numElements)¶ 
Convert device halfprecision values to doubles.
 Parameters


target
: Target that the halfprecision data is to be copied from. 
src
: Pointer to the start of the halfprecision data. 
dst
: Pointer to the double precision data to write. 
numElements
: Number of items to convert.

 void
copyDoubleToDeviceHalf
(const Target &target, const double *src, void *dst, std::size_t numElements)¶ 
Convert double precision values to device halfprecision values.
 Parameters


target
: Target that the halfprecision data is to be copied to. 
src
: Pointer to the double precision data to read. 
dst
: Pointer to the halfprecision data to write. 
numElements
: Number of items to convert.

 class
Target
¶  #include <Target.hpp>
A target representation.
The Target class holds characteristics of a compilation target and enables interaction with it.
Target creation options

ipuLinkConfiguration
(Default, BarleyTwist, SlidingWindow, None) [=None]The configuration used for the IPU to IPU connections (known as the Newmanry network). ‘None’ means that Poplar decides, based on the number of IPUs.
Note that ‘Default’ is not the default!
Public Functions
Target
()¶
~Target
()¶
 TargetType
getTargetType
() const¶ 
The target type.
 unsigned
getNumIPUs
() const¶ 
The number of IPUs.
 unsigned
getTilesPerIPU
() const¶ 
The number of tiles per IPU.
 unsigned
getNumWorkerContexts
() const¶ 
The number of worker contexts per tile.
 unsigned
getBytesPerTile
() const¶ 
Bytes of memory per tile.
 unsigned
getExchangeBytesPerCycle
() const¶ 
The bandwidth of internal IPU exchange in bytes per cycle.
 unsigned
getMemcpyBytesPerCycle
() const¶ 
The maximum bandwidth for internal data copies on a tile.
 unsigned
getMinIPUSyncDelay
() const¶ 
The IPU sync delay for the tile that is closest to the sync controller.
 unsigned
getGlobalSyncCycles
() const¶ 
The number of clock cycles required to synchronize all IPUs.
 unsigned
getInterleavedMemoryElementIndex
() const¶ 
Memory element offset index for interleaved memory.
 const std::vector<GlobalExchangeConstraint> &
getGlobalExchangeConstraints
() const¶ 
Set of constraints that provide a lower bound on the time it takes to send data between IPUs.
 unsigned
getNumStrideBits
() const¶
 unsigned
getDataPathWidth
() const¶ 
The width of the load/store data path within the tile.
 unsigned
getFp16ConvUnitMaxPipelineDepth
() const¶ 
The maximum pipeline depth of the convolution units within the tile for fp16.
 unsigned
getFp32ConvUnitMaxPipelineDepth
() const¶ 
The maximum pipeline depth of the convolution units within the tile for fp32.
 unsigned
getFp16ConvUnitInputLoadElemsPerCycle
() const¶ 
The number of input elements loaded per cycle in f16 convolution unit.
 unsigned
getFp32ConvUnitInputLoadElemsPerCycle
() const¶ 
The number of input elements loaded per cycle in f32 convolution unit.
 unsigned
getFp16InFp16OutConvUnitsPerTile
() const¶ 
The number of convolution units in the tile that can be used when partial results are outputs as 16bits and inputs are 16 bits.
 unsigned
getFp16InFp32OutConvUnitsPerTile
() const¶ 
The number of convolution units in the tile that can be used when partial results are outputs as 32bits and inputs are 16 bits.
 unsigned
getFp32InFp32OutConvUnitsPerTile
() const¶ 
The number of convolution units in the tile that can be used when accumulating to 32 bit values.
 unsigned
getConvUnitCoeffLoadBytesPerCycle
() const¶ 
The number of convolutional weights that can be loaded in a cycle.
 unsigned
getRptCountMax
() const¶
 bool
supportsExchangeBusSharing
() const¶ 
Whether tiles can share the local exchange bus during exchange.

The number of consecutive tiles that can share the exchange bus.
 unsigned
getNumTiles
() const¶ 
Get the total number of tiles for this target (tiles per IPU * number of IPUs).
 std::uint64_t
getMemoryBytes
() const¶ 
Get the total amount of memory on this target, across all IPUs.
 unsigned
getFloatVectorWidth
() const¶ 
How many floats can be processed in one vector operation.
Equivalent to getDataPathWidth() / 32.
 unsigned
getHalfVectorWidth
() const¶ 
How many halves can be processed in one vector operation.
Equivalent to getDataPathWidth() / 16.
 unsigned
getVectorWidth
(const poplar::Type &type) const¶ 
How many of the given type can be processed in one vector operation.
 unsigned
getWeightsPerConvUnit
(bool floatActivations) const¶
 unsigned
getConvUnitInputLoadElemsPerCycle
(bool floatActivations) const¶
 unsigned
getMaxIPUSyncDelay
() const¶ 
Get the maximum number of cycles required for an IPU sync in the best case scenario (all tiles are immediately ready).
 double
getTileClockFrequency
() const¶ 
Get the tile clock frequency in Hertz.
 std::size_t
getAtomicStoreGranularity
() const¶ 
Get the granularity of atomic stores that can be made by independent parallel worker threads.
 Return

The granularity in bytes.
 uint32_t
makeFpIctlValue
(bool inv, bool div0, bool oflo, bool esr, bool nanoo) const¶ 
Generate a value that could be written to Floating Point Initial Control Value register CSR_S.FP_ICTL in order to configure it with the specified options.
 Parameters


inv
: If true, a floatingpoint invalid operation (defined by IEEE 754) will cause an exception.The invalid operations are:

Addition or subtraction where the operands are + or  infinity (inf) and the operation results in the subtraction of two infs; for example: (inf)+(+inf) or (+inf)(+inf).

Divisions: (+/0)/(+/0) and (+/inf)/(+/inf).

Multiplications: (+/0)*(+/inf) and (+/inf)*(+/0).

Remainder: x REM y where y=0 or x=(+/inf)

Real operations with complex results such as the square root or logarithm of a negative number.

Operations with NotaNumber as at least one operand.

Comparisons where one of the operands is NotaNumber.
See also nanoo below.


div
: If true a floating point divide by zero operation will cause an exception 
oflo
: If true a floating point overflow will cause an exception 
esr
: Enable stochastic rounding 
nanoo
: Enable NotaNumber on overflow mode. When enabled half precision calculations that have overflowed will produce a NotaNumber result, rather than saturating to the half precision max/min value, and the invalid operation (inv
) flag will be set

 unsigned
getFpIctlRegIndex
() const¶ 
Return the register index of the Floating Point Initial Control Value register CSR_S.FP_ICTL.
 unsigned
getDbgDataRegIndex
() const¶ 
Return the register index of CSR_C.DBG_DATA.
 core::Target &
getImpl
() const¶
Public Static Functions
 static Target
createCPUTarget
(bool accurateHalf = false)¶ 
Create a CPU target.
Create a target for executing the graph on the CPU. This target will have 1 tile and 1 worker.
 Return

A Target object that can be used to create a graph.
 static Target
createIPUTarget
(unsigned numIPUs, StringRef systemType, const OptionFlags &opts = {})¶ 
Create an IPU target.
Create an IPU target with a specified number of IPUs based on the given system type.
 Return

A Target object that can be used to create a graph.
 Parameters


numIPUs
: The number of IPUs the target should be for. 
systemType
: The ID of the system. 
opts
: The option passed to the target.

 static Target
createIPUTarget
(unsigned numIPUs, unsigned tilesPerIPU, StringRef systemType, const OptionFlags &opts = {})¶ 
Create an IPU target with a virtual number of tiles.
Create an IPU target with a specified number of IPUs based on the given system type. In addition, the number of tiles can be restricted to a smaller virtual number of observable tiles.
 Return

A Target object that can be used to create a graph.
 Parameters


numIPUs
: The number of IPUs the target should be for. 
tilesPerIPU
: The number of tiles per IPU. 
systemType
: The ID of the system. 
opts
: The option passed to the target.


 void
poplar/Device.hpp¶
 namespace
poplar

 class
Device
¶  #include <Device.hpp>
A device refers to a physical entity that can execute code.
Devices should be obtained from a poplar::DeviceManager object or from appropriate factory poplar::Device::createXXXDevice(). Devices can not be copied but can be moved.
Public Functions
Device
()¶
 virtual
~Device
()¶
 unsigned
getId
() const¶ 
Get the numerical ID of this device as known by the DeviceManager.
 std::vector<unsigned>
getOverlappingDeviceIds
() const¶ 
Get the list of device IDs that this device overlaps with.
 bool
attach
() const¶ 
Try and acquire this device and lock it to the current process.
 void
detach
() const¶ 
Release this device to other processes.
 void
getDriverVersion
(unsigned &major, unsigned &minor, unsigned &point) const¶ 
Retrieve driver version of the attached device.
Throws if the device is not attached or is not an IPU device.
 std::vector<unsigned>
getDriverIDs
() const¶ 
Get the list of driver device IDs that make up this device.
 Device
createVirtualDevice
(unsigned tilesPerIPU)¶ 
Create a virtual device with a restricted number of tiles per IPU.
This method provides a smaller “virtual” device whose target only shows a subset of the tiles on the underlying device.
The calling object becomes a null device (the underlying device is moved into the returned Device object).
 core::Device &
getImpl
() const¶
Public Static Functions
 static Device
createSimulatorDevice
(const Target &target, const OptionFlags &options = {})¶ 
Create a device that runs code on the IPU simulator.
The current options are:

debug.trace
(true, false) [=false]Enables debug tracing

sim.simulateFullDevice
(true, false) [=false]Controls whether simulation is over all tiles that the hardware would have, even if the target has fewer used tiles
Options can be overridden with the environment variable
POPLAR_SIMULATOR_OPTIONS
. For example:POPLAR_SIMULATOR_OPTIONS='{"sim.simulateFullDevice":"true"}'
 Parameters


target
: The target simulator. 
options
: Options for simulation.


 class
poplar/DeviceManager.hpp¶
 namespace
poplar

 class
DeviceManager
¶  #include <DeviceManager.hpp>
A DeviceManager is able to enumerate and return groups of physical IPUs connected to an entity/host.
It returns such a group of IPUs as a single poplar::Device with a unique device manager id.
The physical devices within any returned Device may overlap with other Devices returned.
Any poplar::Device(s) returned can not be copied but can be moved for further use.
Public Functions
DeviceManager
()¶
DeviceManager
(const DeviceManager&)¶
 virtual
~DeviceManager
()¶
 std::vector<Device>
getDevices
(const OptionFlags &opts = {}) const¶ 
Get the list of all devices.
 std::vector<Device>
getDevices
(TargetType type, unsigned requiredNumIPUs, const OptionFlags &opts = {}) const¶ 
Get the list of all devices fulfilling the specified criteria.
 Return

A matching device
 Parameters


type
: The desired target type (simulator, IPU, etc.) 
requiredNumIPUs
: Number of IPUs required 
opts
: The arguments passed to the target (optional)

 Device
getDevice
(unsigned deviceManagerId, const OptionFlags &opts = {}) const¶ 
Get a specific device by its device manager id.
 Return

A matching device
 Parameters


deviceManagerId
: The ID of the requested device. The ID is that returned by thegcinfo
command. This can specify a single device or a group of devices. 
opts
: The arguments passed to the target (optional)

Public Static Functions
 static DeviceManager
createDeviceManager
()¶ 
Create a device manager for the current host.
 class
Graph execution¶
poplar/Engine.hpp¶
 namespace
poplar

Functions
 Executable
compileGraph
(const Graph &graph, ArrayRef<program::Program> progs, const OptionFlags &opt = {}, ProgressFunc progressCallBack = ProgressFunc())¶ 
Compile the given graph and programs to make an executable that can be executed using a poplar::Engine.
 Parameters


graph
: The graph to compile. 
progs
: The list of programs to run over the graph. Each program can be run separately by calling the run() method of the Engine with the argument being the index of the program to run in this list. 
opt
: Options that can be used to control compilation and execution. The available options are listed under Engine. 
progressCallBack
: A function that will be called to indicate engine compilation progress.See
Engine::ProgressFunc for more information.

 Exceptions


invalid_option
: If any of the options passed inopt
were not recognised or improperly formatted. 
link_error
: If program linking fails; for example, due to undefined symbols or lack of memory on a tile.

 Executable
compileGraph
(Graph &&graph, ArrayRef<program::Program> progs, const OptionFlags &opt = {}, ProgressFunc progressCallBack = ProgressFunc())¶
 class
Engine
¶  #include <Engine.hpp>
A graph compute engine.
The Engine class provides the ability to execute a graph program.
Engine creation options
Options can be overridden with the environment variable
POPLAR_ENGINE_OPTIONS
. For example:POPLAR_ENGINE_OPTIONS='{"target.workerStackSizeInBytes":"512"}'
Engine creation options: Debug

debug.allowOutOfMemory
(true, false) [=false]If true, allow outofmemory while compiling and linking.

debug.computeInstrumentationLevel
(vertex, tile, device, ipu) [=tile]The granularity of compute instrumentation. This option has no effect unless debug.instrumentCompute is true.

vertex: Store the last cycle count of each vertex on every tile

tile: Store the last cycle count of each compute set on every tile

device: Store the last cycle count of each compute set on one tile. This saves memory compared to
tile
(since the cycle counts are always live and this needs to store them on only one tile), but it loses all pertile cycle information. It works by adding a sync after each compute set and timing how long it takes to get to that sync, so effectively it measures the cycle time of the longestrunning tile in the compute set. 
ipu: Similar to “device”, but instead of storing the cycle counts on a single tile across all IPUs, it stores them on one tile per IPU which avoids the need for global syncs.


debug.cpuMultiThreadExecution
(true, false) [=true]If true, operations are executed using multiple host threads for a CPU or IPU model target. Setting to false may simplify debugging at the cost of reduced performance.

debug.instrument
(true, false) [=false]If true, enable all instrument options (below). This will instruct the engine to add cycle counters to the compiled program to enable the execution profile to be retrieved after the program is run. This is only available for an IPU target (not an IPU Model target). Note that the more specific instrumentation options may override the default. For example,
{"debug.instrument":"true", "debug.instrumentExternalExchange":"false"}
will instrument everything apart from external exchange.

debug.instrumentCompute
(true, false) [=false]If true, enable instrumentation of compute sets. See
debug.instrument
. 
debug.instrumentExternalExchange
(true, false) [=false]If true, enable instrumentation of external exchanges. See
debug.instrument
. 
debug.outputAllSymbols
(true, false) [=false]If true, output additional symbols to the ELF files that are not required but aid debugging.

debug.profilingTile
Integer [=1215]The tile on which to store the cycle counter for every comput set. This has no effect unless
debug.computeInstrumentationLevel
is set todevice
. 
debug.runtimeVerify
(true, false) [=false]If true, expensive verification steps are enabled at runtime.

debug.trace
(true, false) [=false]If true, a trace is printed to the error stream with the state of every edge before and after the execution of a compute set or exchange.

debug.traceFile
StringOnly used if
debug.trace
is true. If set, the debug trace is output to the specified file instead of the error stream. 
debug.verify
(true, false) [=false]If true, expensive verification steps are enabled at compile time. The checks mostly focus on exchange code, including the following:

ensuring variables have been set,

ensuring section/instruction alignment is correct,

and ensuring the total number of bytes received is as expected.
In addition after laying out memory we verify the memory constraints on variables are satisfied.

Engine creation options: Optimisations

opt.maxCompilationThreads
Integer [=0]The maximum number of threads to use during compilation. A value of 0 means the hardware will be fully utilised.
Engine creation options: Target

target.deterministicWorkers
(true, false) [=true]Ensure that the mapping of vertices to worker threads is the same for repeated execution. This guarantee does not hold following breakpoints or exceptions.

target.saveArchive
StringIf set, the binary archive will be saved to the specified filename during graph compilation. This archive contains the Elf files for each tile. No archive will be saved unless this option is set.

target.supervisorStackSizeInBytes
Integer [=96]The stack size allocated to supervisor threads (in bytes).

target.workerStackSizeInBytes
Integer [=256]The stack size allocated to worker threads. If a stack overflow exception occurs, it may be possible to increase the stack size, provided there is sufficient memory available.

target.syncMethod
(polling, hybrid, default) [=default]Controls how the host determines when an IPU wants to sync

polling: Using polling to determine when an IPU wants to sync.

hybrid: Use a mixture of interrupts and polling to determine an IPU wants to sync.

default: Choose a sensible default method based on the device type.


target.syncPollPeriodUs
Integer [=0]The period to use when polling for a host sync, in microseconds.
Public Types
 using
ProgressFunc
= std::function<void(int, int)>¶ 
Callback function used to to indicate engine compilation progress.
The function is passed two integers. The first is the progress value and the second is the maximum value for the progress.
If a progress callback is used, the function should not block. All calls to the callback function will be made in a single dedicated thread so blocking in the callback will block the receipt of further notifications (but will not block compilation from progressing). The callback should not use Poplar objects or functions relating to the Graph, Engine or Device that are being compiled.
Public Functions
Engine
(const Graph &graph, ArrayRef<program::Program> progs, const OptionFlags &opt = {}, ProgressFunc progressCallBack = ProgressFunc())¶
Construct the engine from a graph and a list of programs.
 Parameters


graph
: The graph to compile into the engine. 
progs
: The list of programs to run over the graph. Each program can be run separately by calling the run() method of the Engine with the argument being the index of the program to run in this list. 
opt
: Options that can be used to control compilation and execution. The available options are listed under Engine. 
progressCallBack
: A function that will be called to indicate engine compilation progress.See
Engine::ProgressFunc for more information.

 Exceptions


invalid_option
: If any of the options passed inopt
were not recognised or improperly formatted. 
link_error
: If program linking fails; for example, due to undefined symbols or lack of memory on a tile.

Engine
(Graph &&graph, ArrayRef<program::Program> progs, const OptionFlags &opt = {}, ProgressFunc progressCallBack = ProgressFunc())¶
Engine
(const Graph &graph, program::Program prog, const OptionFlags &opt = {}, ProgressFunc progressCallBack = ProgressFunc())¶
Construct the engine from a graph and a program.
 Parameters


graph
: The graph to compile into the engine. 
prog
: The program to run over the graph. This program is run when the run() method is called on the Engine. 
opt
: Options that can be used to control compilation and execution. The available options are listed under Engine. 
progressCallBack
: A function that will be called to indicate engine compilation progress.See
Engine::ProgressFunc for more information.

 Exceptions


invalid_option
: If any of the options passed inopt
were not recognised or improperly formatted. 
link_error
: If the program linking fails; for example, due to undefined symbols or lack of memory on a tile.

Engine
(Graph &&graph, program::Program prog, const OptionFlags &opt = {}, ProgressFunc progressCallBack = ProgressFunc())¶
Engine
(Executable &&exe, const OptionFlags &opt = {})¶
Construct the engine from a precompiled executable.
 Parameters


exe
: The precompiled executable. This can be created using poplar::compileGraph(). 
opt
: Options that can be used to control execution. These must be the same as the flags passed to compileGraph(). The available options are listed under Engine.

 Exceptions


invalid_option
: If any of the options passed inopt
were not recognised or improperly formatted.

~Engine
()¶
 void
load
(const Device &device)¶ 
Load the compiled program/graph onto a device.
This function will load all binary code and data onto the device ready for execution.
 Parameters


device
: The device to load onto.

 void
run
(unsigned prog = 0)¶ 
Run the graph program.
This function will execute the graph program. Note that the program needs to have already been loaded onto a device otherwise an exception will occur.
 Parameters


prog
: The index of the program to run. If this is greater than or equal to the number of programs given in the constructor then an exception is thrown.

 void
loadAndRun
(const Device &device, unsigned prog = 0)¶ 
Run the graph program.
This function will load the program/graph onto the device and then execute the graph program.
 Parameters


prog
: The index of the program to run. If this is greater than or equal to the number of programs given in the constructor then an exception is thrown.

 TimerTimePoint
getTimeStamp
()¶ 
Get a record of the current host and device time.
Details depend on the underlying device used.
 const ProfileValue &
getGraphProfile
() const¶ 
Get a report containing profiling data for the graph on the underlying device.
This is only valid to call if the underlying device of the graph is an IPU model device.
 Return

A reference to an internal profile.
 Exceptions


profiling_disabled
: If the device is not an IPU or IPU model.

 const ProfileValue &
getExecutionProfile
()¶ 
Get a report containing profiling data for programs executed with this engine since this engine was constructed/the execution report was last reset.
See the Poplar SDK User Guide for details of the data in the report.
 Return

A reference to an internal profile. Be aware if you store a reference to this, rather than copying it, then it may change when you run further programs.
 Exceptions


profiling_disabled
: If the device is not an IPU or IPU model.

 ProfileValue
getProfile
()¶ 
Get a report containing profiling data for both the graph and the programs executed with this engine.
This is the equivelent to getting both the graph profile and execution profiles in a single ProfileValue.
See the Poplar SDK User Guide for details of the data in the report.
 Return

A copy of the internal profile.
 Exceptions


profiling_disabled
: If the device is not an IPU or IPU model.

 void
resetExecutionProfile
()¶ 
Reset execution profile.
When programs are run their profiles are appended to the execution profile. This discards profiling information for previously executed programs.
 void
disableExecutionProfiling
()¶ 
Pause execution profiling.
Subsequent engine.run() calls are executed without being profiled until a subsequent call to
enableExecutionProfiling
.So, you can exclude individual programs from a profile like this:
engine.disableExecutionProfiling(); engine.run(...); engine.enableExecutionProfiling();
 void
enableExecutionProfiling
()¶ 
Enable execution profiling.
Subsequent engine.run() calls are profiled when executed.
 void
printProfileSummary
(std::ostream &outputStream, const OptionFlags &opt = {})¶ 
Get and print the summary of a report with the given options.
This is equivalent to getting and printing the summary of both the graph and execution reports using poplar::printProfileSummary().
 Parameters


outputStream
: A stream to write the summary to. 
opt
: A set of option flags configuring the contents of the report. All can be “true” or “false”. The default is “false”.The available options are:

showVarStorage
(true, false) 
showOptimizations
(true, false) 
showExchangeInstructionBreakdown
(true, false) 
showExecutionSteps
(true, false)


 Exceptions


profiling_disabled
: If the device is not an IPU model device. 
invalid_option
: If any of the options passed inopt
were not recognised or improperly formatted.

 void
reportIntervals
(std::ostream &outputStream)¶ 
Write a CSV data file to a specified output stream containing the number of tiles active over time in cycles for compute, synchronisation and exchange phases.
Each row contains the following entries:

begin time in cycles

end time in cycles

number of tiles participating in compute

number of tiles participating in exchange

number of tiles participating in synchronisation
Because tiles execute a number of threads (up to 6) in parallel a single “thread cycle” may only be executed every 6 tile clock cycles. The cycles reported by this function are tile clock cycles rather than thread cycles.
 Parameters


outputStream
: An output stream for the CSV data to be written to.

 Exceptions


profiling_disabled
: If the device has no profiling enabled.


 void
readTensor
(StringRef handle, void *buf)¶ 
Synchronous copy of a buffer of data from a specific tensor in the device into a host size buffer.
The tensor must have been marked as an output tensor. The buffer must have room for all of the tensor data. The handle should match the one passed to Graph::createHostRead()
 See
 Parameters


handle
: The source host copy handle. 
buf
: The destination of the read.

 void
readTensor
(StringRef handle, void *buf, void *bufEnd)¶ 
Synchronous copy of a buffer of data from a specific tensor in the device into a host size buffer.
The tensor must have been marked as an output tensor. The buffer must have room for all of the tensor data. Buffer end address required for sizes verification. The handle should match the one passed to Graph::createHostRead()
 See
 Parameters


handle
: The source host copy handle. 
buf
: The destination of the read. 
bufEnd
: The end address of destination space

 void
writeTensor
(StringRef handle, const void *buf)¶ 
Synchronous copy of a buffer of data from the host to a specific tensor in the device.
The tensor must have been marked as an input tensor. The buffer must have enough data for the whole tensor. The handle should match the one passed to Graph::createHostWrite()
 See
 Parameters


handle
: The destination host copy handle. 
buf
: The source of the write.

 void
writeTensor
(StringRef handle, const void *buf, const void *bufEnd)¶ 
Synchronous copy of a buffer of data from the host to a specific tensor in the device.
The tensor must have been marked as an input tensor. Buffer end address required for sizes verification. The handle should match the one passed to Graph::createHostRead()
 See
 Parameters


handle
: The destination host copy handle. 
buf
: The source of the write. 
bufEnd
: The end address of source space.

 void
connectStream
(StringRef handle, void *begin, void *end)¶ 
Connect a stream to a circular buffer in memory.
Each time data is copied to/from the stream the pointer for the next transfer is incremented within the bounds given.
 Parameters


handle
: The name of the stream to connect to 
begin
: Pointer to the start of the circular buffer 
end
: Pointer to the end of the circular buffer.

 void
connectStream
(const DataStream &stream, void *begin, void *end)¶ 
Connect a stream to a circular buffer in memory.
Each time data is copied to/from the stream the pointer for the next transfer is incremented within the bounds given.
 Parameters


stream
: The stream to connect to 
begin
: Pointer to the start of the circular buffer 
end
: Pointer to the end of the circular buffer.

 void
connectStream
(StringRef handle, void *p)¶ 
Connect a stream to a fixed location in memory.
Each time data is copied to/from the stream this location will be read/written.
 Parameters


handle
: The name of the stream to connect to 
p
: The pointer to the memory buffer

 void
connectStream
(const DataStream &stream, void *p)¶ 
Connect a stream to a fixed location in memory.
Each time data is copied to/from the stream this location will be read/written.
 Parameters


stream
: The stream to connect to 
p
: The pointer to the memory buffer

 void
connectStreamToCallback
(StringRef handle, StreamCallbackHandle f)¶ 
Connect a stream to a callback taking a pointer to the location in memory to copy into/from.
This will be called whenever the stream will be read/was written by the device. The given memory location will only be valid to read from/write to for the duration of the callback.
 Parameters


handle
: The name of the stream to connect to. 
f
: Callback to be called whenever the stream is to be read/was written by the device.

 void
connectStreamToCallback
(const DataStream &stream, StreamCallbackHandle f)¶ 
Connect a stream to a callback taking a pointer to the location in memory to copy into/from.
This will be called whenever the stream will be read/was written by the device. The given memory location will only be valid to read from/write to for the duration of the callback.
 Parameters


stream
: The stream to connect to. 
f
: Callback to be called whenever the stream is to be read/was written by the device.

 void
connectStreamToCallback
(StringRef handle, unsigned index, StreamCallbackHandle f)¶ 
Connect a replicated stream to a callback taking a pointer to the location in memory to copy into/from.
This will be called whenever the stream will be read/was written by the device. The given memory location will only be valid to read from/write to for the duration of the callback.
 Parameters


handle
: The name of the stream to connect to. 
index
: The replicated index to connect to. 
f
: Callback to be called whenever the stream is to be read/was written by the device.

 void
connectStreamToCallback
(const DataStream &stream, unsigned index, StreamCallbackHandle f)¶ 
Connect a replicated stream to a callback taking a pointer to the location in memory to copy into/from.
This will be called whenever the stream will be read/was written by the device. The given memory location will only be valid to read from/write to for the duration of the callback.
 Parameters


stream
: The stream to connect to. 
index
: The replicated index to connect to. 
f
: Callback to be called whenever the stream is to be read/was written by the device.

 void
copyFromRemoteBuffer
(const RemoteBuffer &buffer, void *w, int repeat_index, unsigned replication_index = 0)¶ 
Return a list of all streams in the engine.
copies from a remote buffer to a user buffer w
 Parameters


buffer
: The remote buffer to copy from 
w
: The user buffer to copy to 
repeat_index
: The index in the remote buffer to copy from 
replication_index
: The replicated graph index

 void
copyToRemoteBuffer
(void *w, const RemoteBuffer &buffer, int repeat_index, unsigned replication_index = 0)¶ 
Return a list of all streams in the engine.
copies from a remote buffer to a user buffer w
 Parameters


w
: The user buffer to copy from 
buffer
: The remote buffer to 
repeat_index
: The index in the remote buffer to copy from 
replication_index
: The replicated graph index

 std::vector<std::string>
listStreams
() const¶ 
Return a list of all streams in the engine.
 Return

Vector of strings each of which is a stream’s handle postfixed with ‘+’ or ‘‘ indicating whether the stream is a hostwrite or a hostread respectively.
 void
setPrintStream
(std::ostream &stream)¶ 
Set output stream for printf commands.
 Parameters


stream
: The output stream to use.

 void
setPrintTensorStream
(std::ostream &stream)¶ 
Set the output stream for PrintTensor programs.
By default tensors are printed to stderr.
 Parameters


stream
: The output stream to use.

 const core::Engine &
getImpl
() const¶
Public Static Functions
 static std::string
reportTiming
(const TimerTimePoint &start, const TimerTimePoint &end)¶ 
Get a timing report for the measured interval.
Details depend on the underlying device used.
 Parameters


start
: Start time of report 
end
: End time of report


 Executable
poplar/StreamCallback.hpp¶
 namespace
poplar

 class
LegacyStreamCallback
: public poplar::StreamCallback¶  #include <StreamCallback.hpp>
Convenience StreamCallback specialization for implementations that do not support prefetch/complete operations.
Public Functions
 virtual Result
prefetch
(void *p)¶ 
Callback function to fill the host buffer (hosttodevice streams only).
This function is called speculatively, this means it might still be called even if no additional copies for this stream exist for the remaining execution of the program.
The following situations are possible during the invocation: a) There is more data available for consumption. b) Data is temporarily not available during the point in time this function is called. c) The stream reached the end and thus has not got any more data available.
The return value indicates if the invocation resulted in the buffer being successfully filled. In the first case (a), the function shall return
Result::Success
. A call tocomplete
will follow if the program ends up transferring the data. Otherwise (scenarios b and c), it must returnResult::NotAvailable
. Calls tofetch
and thencomplete
will follow if the transfer takes place. Return

Result::Success
if the function was able to fill the buffer with data, orResult::NotAvailable
otherwise.  Parameters


p
: Location of the buffer. It will only be valid for the duration of the function.

 virtual void
complete
()¶ 
Notifies that the data involved in the last prefetch/fetch invocation is used by the device.
It usually means that a speculative read was a hit, and the callback can move on to the next piece of input.
 virtual Result
 class
StreamCallback
¶  #include <StreamCallback.hpp>
Interface used during stream copies to produce/consume the data being exchanged between the host and the device.
In regular stream copies,
fetch
andcomplete
functions are called as a result of the device requesting the data transfer.If the following engine options are set,
prefetch
function will be called after an ongoing hosttodevice transfer of the same stream completes:
exchange.streamBufferOverlap=none

exchange.enablePrefetch=true
Subclassed by poplar::LegacyStreamCallback
Public Functions
 virtual
~StreamCallback
()¶
 virtual Result
prefetch
(void *p) = 0¶ 
Callback function to fill the host buffer (hosttodevice streams only).
This function is called speculatively, this means it might still be called even if no additional copies for this stream exist for the remaining execution of the program.
The following situations are possible during the invocation: a) There is more data available for consumption. b) Data is temporarily not available during the point in time this function is called. c) The stream reached the end and thus has not got any more data available.
The return value indicates if the invocation resulted in the buffer being successfully filled. In the first case (a), the function shall return
Result::Success
. A call tocomplete
will follow if the program ends up transferring the data. Otherwise (scenarios b and c), it must returnResult::NotAvailable
. Calls tofetch
and thencomplete
will follow if the transfer takes place. Return

Result::Success
if the function was able to fill the buffer with data, orResult::NotAvailable
otherwise.  Parameters


p
: Location of the buffer. It will only be valid for the duration of the function.

 virtual void
complete
() = 0¶ 
Notifies that the data involved in the last prefetch/fetch invocation is used by the device.
It usually means that a speculative read was a hit, and the callback can move on to the next piece of input.
 virtual void
fetch
(void *) = 0¶ 
Callback function to fill the host buffer.
This function is called as a result of a stream copy, unless the last
prefetch
invocation was successful.It must always fill the buffer with more data and it is followed by a call to
complete
.

 class
StreamCallbackHandle
¶  #include <StreamCallback.hpp>
Wrapper for StreamCallback instances.
Provides backwards compatibility with C++ lambda expressions and
std::function
instances.Public Functions
 template<class
CallbackImpl
, typename = typename std::enable_if<std::is_base_of<StreamCallback, CallbackImpl>::value>::type>StreamCallbackHandle
(std::unique_ptr<CallbackImpl> f)¶ 
Constructs a handle from an instance of a stream callback implementation.
This constructor only participates in overload resolution if CallbackImpl is derived from poplar::StreamCallback (i.e. it is an implementation of the callback interface).
 template<class
F
, typename = typename std::enable_if<traits::is_callback<F>::value>::type>StreamCallbackHandle
(F &&f)¶ 
Constructs a handle from a callable instance.
This constructor only participates in overload resolution if F satisfies the requirements of a Function Object. It transforms
f
into a LegacyStreamCallback implementation.
StreamCallbackHandle
(const StreamCallbackHandle&)¶
StreamCallbackHandle
(StreamCallbackHandle&&)¶
operator std::unique_ptr<StreamCallback>
() &&¶
Extracts the callback implementation from the handle.
Private Members
 std::unique_ptr<StreamCallback>
callback
¶
Private Static Functions
 template<class
F
>
static std::unique_ptr<StreamCallback>makeCallback
(F &&f)¶
 template<class
 class
Serializing executable state¶
poplar/Executable.hpp¶
 namespace
poplar

 class
Executable
¶  #include <Executable.hpp>
An instance of poplar::Executable contains all of the information needed to run a program on an IPU device.
It can be saved to or loaded from disk.
Public Functions
~Executable
()¶
Executable
(Executable &&other)¶
 Executable &
operator=
(Executable &&other)¶
 void
serialize
(std::ostream &out) const¶ 
Serialize an executable to a stream.
All of the binary files and metadata needed to run a Poplar executable will be written to the stream. Currently the format is opaque, and compatibility between different versions of Poplar is not guaranteed.
 Parameters


out
: The stream to write to. It must be seekable.

 Exceptions


poplar_error
: if the target is not an IPU  this cannot be used to serialise CPU or IPU_MODEL executables.

Public Static Functions
 static Executable
deserialize
(std::istream &in)¶ 
Load an executable from a stream.
 Parameters


in
: The stream to read from. It must be seekable.

Friends
 friend
poplar::Engine
 class
Profiling & performance modelling¶
poplar/ProfileValue.hpp¶
 namespace
poplar

Functions
 void
serializeToJSON
(std::ostream &out, const ProfileValue &val, bool prettyPrint = false)¶
 void
serializeToCBOR
(std::ostream &out, const ProfileValue &val)¶
 void
printGraphSummary
(std::ostream &out, const ProfileValue &graphProfile, const OptionFlags &opts)¶ 
Print a summary of the static graph profiling information  primarily memory use.
The available options are:

showOptimizations
(true, false) [=false]If true, information about the optimisations performed are included in the summary output.

showPerIpuMemoryUsage
(true, false) [=false]If true, total memory usage perIPU is included in the summary output in addition to memory usage for the whole device.

showVarStorage
(true, false) [=false]If true, information about variable storage liveness is included in the summary output. This is provided for some tiles with the highest maximum live bytes as well as a total for all tiles. The maximum live bytes is output along with information about alwayslive variables.

colours
(true, false)Specify whether colours should be displayed in the profile report. If not set, colours will be displayed only if outputting to a supported terminal. If not set, using environment variable
CLICOLOR_FORCE=1
forces colours to be displayed, whileCLICOLOR=0
disables colours.

 void
printExecutionSummary
(std::ostream &out, const ProfileValue &graphProfile, const ProfileValue &executionProfile, const OptionFlags &opts)¶ 
Print a summary of the execution profiling information  primarily cycle counts.
The information printed depends on the target and the execution profiling mode. IPUModel always prints a simulation of execution.
The available options are:

showExecutionSteps
(true, false) [=false]If true, the program execution sequence with cycle estimates is included in the summary output.

colours
(true, false)See printGraphSummary().

 void
printProfileSummary
(std::ostream &out, const ProfileValue &graphProfile, const ProfileValue &executionProfile, const OptionFlags &opts = {})¶
 class
ProfileValue
¶  #include <ProfileValue.hpp>
ProfileValue represents a readonly JSONlike tree of values that are used to store the output of the profiler.
Each value can be one of:

A string

A doubleprecision number

A vector<> of child values

A map<string, …> of child values. Only string keys are supported.
If an invalid access is made, for example an outofrange access or accessing the wrong type, then an exception is thrown. It is possible to write code that should never throw an exception by using type().
See the Poplar SDK User Guide for more information.
Public Functions
 bool
asBool
() const¶
 double
asDouble
() const¶
 const ProfileValue &
operator[]
(StringRef s) const¶
 const std::map<std::string, ProfileValue> &
asMap
() const¶
 const ProfileValue &
operator[]
(std::size_t i) const¶
 const std::vector<ProfileValue> &
asVector
() const¶
 double
sumDouble
() const¶
ProfileValue
()¶
~ProfileValue
()¶
ProfileValue
(const ProfileValue &other)¶
ProfileValue
(ProfileValue &&other)¶
 ProfileValue &
operator=
(const ProfileValue &other)¶
 ProfileValue &
operator=
(ProfileValue &&other)¶
Friends
 friend
poplar::core::MutableProfileValue

 void
poplar/IPUModel.hpp¶
 namespace
poplar

 struct
IPUModel
¶  #include <IPUModel.hpp>
A model of an IPU to create an IPUModel Device.
Public Types
Public Functions
IPUModel
(char const *IPUVersion = "ipu1")¶
 Device
createDevice
(OptionFlags opts = {}, bool accurateHalf = false, unsigned deviceManagerId = std::numeric_limits<unsigned>::max())¶ 
Create a device that runs code on the CPU and models its performance on an IPU.
Public Members
 unsigned
numIPUs
¶ 
The number of IPUs.
 unsigned
tilesPerSuperTile
¶ 
The number of tiles per supertile.
 unsigned
tilesPerIPU
¶ 
The number of tiles per IPU.
 unsigned
numWorkerContexts
¶ 
The number of worker contexts per tile.
 unsigned
memoryBytesPerTile
¶ 
Memory bytes per tile.
 double
tileClockFrequency
¶ 
Clock frequency in Hz.
 unsigned
exchangeBytesPerCycle
¶ 
The bandwidth of internal IPU exchange in bytes per cycle.
 unsigned
memcpyBytesPerCycle
¶ 
The number of bytes per cycle that can be copied from one location to another using a memcpy.
 unsigned
instructionBytes
¶ 
The size of an instruction in bytes.
 bool
supportsSuperTileSendReceive
¶ 
Whether a tile in a supertile can use all the exchange bandwidth of the supertile to send or receive, when the other tile is idle or receiving the same data.
 unsigned
interleavedMemoryElementIndex
¶ 
Index in the memoryElementOffsets table (returned by Target::getMemoryElementOffsets) which gives the start of the interleaved memory region.
Any value greater than or equal to size of the offsets table is interpreted as machine not having interleaved memory elements. Note that by definition, interleaved memory is always in the upper part of memory
 unsigned
minIPUSyncDelay
¶ 
The IPU sync delay for the tile that is closest to the sync controller.
 unsigned
globalSyncCycles
¶ 
The number of clock cycles required to synchronize all IPUs.
 std::vector<GlobalExchangeConstraint>
globalExchangeConstraints
¶ 
Set of constraints that provide a lower bound on the time it takes to send data between IPUs.
 unsigned
globalExchangePacketBytes
¶ 
Size of the packet used to transfer data between tiles in bytes.
 unsigned
tileLocalSyncSyncDelay
¶ 
Number of cycles from issuing a sync instruction to the earliest time that instructions can resume.
 unsigned
tileLocalSyncExitDelay
¶ 
Number of cycles after a worker has issued its exit instruction that the supervisor can resume.
 unsigned
numStrideBits
¶ 
Number of stride bits.
 unsigned
dataPathWidth
¶ 
The width of the load/store data path within the tile.
 unsigned
fp16ConvUnitMaxPipelineDepth
¶ 
The maximum pipeline depth of the convolution units within the tile for fp16.
 unsigned
fp32ConvUnitMaxPipelineDepth
¶ 
The maximum pipeline depth of the convolution units within the tile for fp32.
Only allow a maximum of 4 cycle AMP loop.
 unsigned
fp16ConvUnitInputLoadElemsPerCycle
¶ 
The input elements loaded per cycle for f16 conv.
 unsigned
fp32ConvUnitInputLoadElemsPerCycle
¶ 
The input elements loaded per cycle for f32 conv.
 unsigned
fp16InFp16OutConvUnitsPerTile
¶ 
The number of convolution units in the tile that can be used when partial results are outputs as 16bits and inputs are 16 bits.
 unsigned
fp16InFp32OutConvUnitsPerTile
¶ 
The number of convolution units in the tile that can be used when partial results are outputs as 32bits and inputs are 16 bits.
 unsigned
fp32InFp32OutConvUnitsPerTile
¶ 
The number of convolution units in the tile that can be used when accumulating to 32 bit values.
 unsigned
convUnitCoeffLoadBytesPerCycle
¶ 
The number of convolutional weights that can be loaded in a cycle.
 unsigned
supervisorInstrFetchDelay
¶ 
Number of bytes supervisor contexts may be loading instructions from memory ahead of current PC.
 unsigned
workerInstrFetchDelay
¶ 
Number of bytes worker context may be loading instructions from memory ahead of current PC.
 unsigned
rptCountMax
¶
 unsigned
atomicStoreGranularity
¶ 
The atomic store granularity.
 bool
compileIPUCode
¶ 
Whether or not to actually compile real IPU code for modelling.
 struct
poplar/GlobalExchangeConstraints.hpp¶
 namespace
poplar

 struct
GlobalExchangeConstraint
¶ 
Public Functions
GlobalExchangeConstraint
(double bandwidth, ArrayRef<GlobalExchangeFlow> flows)¶
 bool
operator==
(const GlobalExchangeConstraint &other) const¶
Public Members
 double
bandwidth
¶ 
Bandwidth in bits per second.
 std::vector<GlobalExchangeFlow>
flows
¶ 
The flows that the constraint applies to.
 struct
GlobalExchangeFlow
¶ 
Public Functions
GlobalExchangeFlow
(unsigned src, unsigned dst)¶
 bool
operator==
(const GlobalExchangeFlow &other) const¶
 struct
poplar/CycleCount.hpp¶
 namespace
poplar

Functions
 poplar::Tensor
cycleCount
(poplar::Graph &graph, poplar::program::Sequence &prog, unsigned tile, const std::string &debugPrefix = "")¶ 
Given a sequence program type, times the program and returns the 64 bit value in a tensor of 2 unsigned integers.
Sequence is timed by adding sync and timing programs around the original sequence. Must also specify the tile on which the program is timed.
 Return

A unsigned integer tensor of length 2
 Parameters


graph
: The Poplar graph 
prog
: The program sequence to time 
tile
: The tile on which the program is timed

 poplar::Tensor
cycleStamp
(poplar::Graph &graph, poplar::program::Sequence &prog, unsigned tile, const std::string &debugPrefix = "")¶ 
Add a sequence program to record an absolute Hw cycle stamp on a given tile.
The stamp is a snapshot of a continuously running h/w counter on a tile and to have consistent results, measurements must be done on the same tile.
The result is a tensor containing two 32bit elements os a 64bit snapshot of the h/w counter. The first element of the tensor is the lower 32bits and the second the upper 32bits.
The timestamp is added after an internal sync is executed.
 Return

A unsigned integer tensor of length 2
 Parameters


graph
: The Poplar graph 
prog
: The program sequence to which the time stamp is added 
tile
: The tile on which the time stamp is added

 std::vector<poplar::Tensor>
cycleStamp
(poplar::Graph &graph, poplar::program::Sequence &prog, const std::vector<unsigned> &tiles, const std::string &debugPrefix = "")¶ 
Add a compute set to record an absolute Hw cycle stamp on the specified tiles.
 Return

A vector of tensors of 2 integers
 Parameters


graph
: The Poplar graph 
prog
: The program sequence to which the time stamp is added 
tiles
: The tiles on which the time stamp is added

 poplar::Tensor
Poplibs API reference¶
The Poplibs libraries provide applicationlevel functions that can be used in Poplar programs for the IPU.
Library 
Depends on 
Description 


General utility functions for building graphs 



Operations on tensors in control programs (elementwise functions and reductions) 


Linear algebra functions (matrix multiplications, convolutions) 


Functions for populating tensors with random numbers 


Functions used in neural networks (for example, nonlinearities, pooling and loss functions) 

Model solving functions 
Utility functions (poputil)¶
General utility functions for building graphs.
poputil/Broadcast.hpp¶
 namespace
poputil
¶ 
Functions
 void
expandToMatchRanks
(poplar::Tensor &a, poplar::Tensor &b)¶ 
Insert singleton dimensions into either of two tensors such that their ranks match following numpy style expansion rules.
The tensor with the lower rank has singleton dimensions inserted as outermost dimensions.
 Parameters


a
: First tensor to match. 
b
: Second tensor to match.

 void
broadcastToMatch
(poplar::Tensor &a, const std::vector<std::size_t> &shape)¶ 
Match dimensions of a tensor to a shape by broadcasting using numpy style broadcast rules:
1) If the rank of the tensor is expand to the dimensions to the left with dimensions of size 1 to match the rank of the required shape.
2) For each dimension, the size of the dimension in the tensors must be the same as the required shape or must have size 1. In the case where it is of size one the tensor is broadcast in that dimension to match the shape. If neither of these conditions hold then an exception is thrown.
 Parameters


a
: The tensor to broadcast to match the shape. This will be updated in place with broadcast dimensions. 
shape
: The shape to match.

 void
broadcastToMatch
(poplar::Tensor &a, poplar::Tensor &b)¶ 
Match dimensions of two tensors by broadcasting using numpy style broadcast rules:
1) If the rank of one tensor is less than the other then extend the dimensions to the left with dimensions of size 1.
2) For each dimension, the size of the dimension in both tensors must be the same or one of them must have size 1. In the case where one is of size one the tensor is broadcast in that dimension to match the other. If neither of these conditions hold then an exception is thrown.
 Parameters


a
: First tensor to match. This will be updated in place with broadcast dimensions. 
b
: Second tensor to match. This will be updated in place with broadcast dimensions.

 void
broadcastToMatch
(poplar::Tensor &a, poplar::Tensor &b, poplar::Tensor &c)¶ 
Match dimensions of three tensors by broadcasting using numpy style broadcast rules:
1) If the rank of one tensor is less than the other then extend the dimensions to the left with dimensions of size 1.
2) For each dimension, the size of the dimension in both tensors must be the same or one of them must have size 1. In the case where one is of size one the tensor is broadcast in that dimension to match the other. If neither of these conditions hold then an exception is thrown.
 Parameters


a
: First tensor to match. This will be updated in place with broadcast dimensions. 
b
: Second tensor to match. This will be updated in place with broadcast dimensions. 
c
: Third tensor to match. This will be updated in place with broadcast dimensions.

 bool
canBroadcastToMatch
(const poplar::Tensor &a, const poplar::Tensor &b)¶ 
Test if the given tensors can be broadcast to match one another using the rules for broadcastToMatch.
 Return

True if the two tensors may be broadcast to match one another and false if they do not match following the broadcastToMatch broadcast rules.
 Parameters


a
: First tensor to match. 
b
: Second tensor to match.

 void
poputil/GraphFunction.hpp¶
 namespace
poputil

 namespace
graphfn
¶ 
Functions
 struct
ArgSig
¶
 class
ProgramFunction
¶ 
Public Functions
Private Members
 VoidFunction
voidFunc
¶
 VoidFunction
 struct
 namespace
poputil/TileMapping.hpp¶
 namespace
poputil

Functions
 std::vector<std::vector<poplar::Interval>>
calcLinearTileMapping
(const poplar::Graph &graph, std::vector<std::size_t> shape, unsigned minElementsPerTile, unsigned grainSize)¶ 
Calculate a tile mapping that spreads the tensor evenly over the tiles in a linear manner (i.e.
with the indices of the flattened tensor mapped across from low > high tile numbers).
 std::vector<std::vector<poplar::Interval>>
calcLinearTileMapping
(const poplar::Graph &graph, const poplar::Tensor &t)¶ 
Calculate a tile mapping that spreads the tensor evenly over the tiles in a linear manner (i.e.
with the indices of the flattened tensor mapped across from low > high tile numbers).
In this case the elements are split so as not to split vectors of elements for the devices natural vector widths and to try and keep at least 128 bytes of data on each tile to avoid high exchange costs.
 void
mapTensorLinearly
(poplar::Graph &graph, const poplar::Tensor &t, unsigned minElementsPerTile, unsigned grainSize)¶
 unsigned
getTileImbalance
(const poplar::Graph::TileToTensorMapping &mapping, unsigned minElementsPerTile = 0, unsigned grainSize = 1)¶ 
Determine how unbalanced a tensor is mapped over tiles.
 Return

The maximum number of elements over expected on any tile.
 Parameters


mapping
: The tile mapping of the tensor 
minElementsPerTile
: The expected minimum number of elements per tile. 
grainSize
: The expected “grain size” i.e. atomic grains used to split of elements over tiles

 unsigned
getTileImbalance
(const poplar::Graph &graph, const poplar::Tensor &t, unsigned minElementsPerTile = 0, unsigned grainSize = 1)¶ 
Determine how unbalanced a tensor is mapped over tiles.
 Return

The maximum number of elements over expected on any tile.
 Parameters


graph
: The graph. 
t
: The tensor to be inspected. 
minElementsPerTile
: The expected minimum number of elements per tile. 
grainSize
: The expected “grain size” i.e. atomic grains used to split of elements over tiles

 void
rebalanceTensor
(poplar::Graph &graph, const poplar::Tensor &t, unsigned minElementsPerTile, unsigned grainSize, unsigned imbalanceThreshold)¶ 
Update a tensor’s tile mapping to be balanced over tiles.
 Parameters


graph
: The graph to which the tensor belongs. 
t
: The tensor to rebalance. 
minElementsPerTile
: The minimum number of elements per tile. 
grainSize
: The “grain size” i.e. atomic grains used to split of elements over tiles. 
imbalanceThreshold
: This value is checked against the current tensor tile imbalance and if the imbalance is less than this value, the tile mapping will not be altered.

 void
rebalanceTensor
(poplar::Graph &graph, const poplar::Tensor &t)¶ 
Update a tensor’s tile mapping to be balanced over tiles.
 Parameters


graph
: The graph to which the tensor belongs. 
t
: The tensor to rebalance.

 void
mapOutputForElementWiseOp
(poplar::Graph &graph, const std::vector<poplar::Tensor> &inputs, const poplar::Tensor &output, unsigned grainSize = 1, unsigned minGrainsPerTile = 0)¶ 
Update a tensor’s tile mapping such that when it is used as the output of an elementwise operation (operation with no dependency between more than one element of the output and any given element of any input tensor).
Use the resulting tensor to map elementwise operations to tiles to produce an operation that is computationally balanced across tiles and which minimises exchange.
 Parameters


graph
: A graph which the given inputs/output belong to. 
inputs
: List of input tensors for the operation. 
output
: Output tensor for the operation. 
grainSize
: Grainsize for elements mapped to each tile. 
minGrainsPerTile
: Minimum no. of grains mapped to a tile.

 poplar::Tensor
cloneToIpu
(poplar::Graph &graph, const poplar::Tensor &t, unsigned dstIPU, poplar::StringRef name = "", poplar::TensorCloneMethod method = poplar::TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES)¶ 
Create a clone of the specified tensor.
Elements of the cloned tensor are mapped to the specified IPU such the index of the tile an element is mapped to within an IPU is preserved.
 Return

The cloned tensor.
 Parameters


graph
: The graph representing the entire multiIPU device. 
t
: The tensor to clone. 
dstIPU
: The index of the IPU to clone the tensor onto. 
name
: A debug name to give to any new tensors allocated in the graph during the clone. If this is empty then the debug names will be derived from existing tensor debug names. 
method
: The method to use for the cloning.

 poplar::Tensor
copyToIpu
(poplar::Graph &masterGraph, const poplar::Tensor &t, poplar::program::Sequence &prog, unsigned dstIPU, poplar::StringRef name = "", poplar::TensorCloneMethod method = poplar::TensorCloneMethod::PRESERVE_ORDER_UNLESS_ALIASES)¶ 
Move a tensor from one IPU to another by duplicating it, mapping the clone onto another IPU, and copying the original to the new one.
 Return

The new tensor on the specified IPU.
 Parameters


masterGraph
: The graph representing the entire multiIPU device. 
t
: The tensor to move from one IPU to another. 
prog
: A program sequence to which the Copy will be added. 
dstIPU
: The index of the IPU onto which the Tensor will be moved. 
name
: A debug name to give to the tensor created on dstIPU. If this is empty then the debug names will be derived from existing tensor debug names. 
method
: The method to use for cloning of the tensor on the destination IPU.

 poplar::Tensor
createIpuCopy
(poplar::Graph &graph, const poplar::Tensor &t, unsigned dstIpu, poplar::Tensor ©Src, poplar::Tensor ©Dst, poplar::StringRef name = "", poplar::TensorCloneMethod method = poplar::TensorCloneMethod::PRESERVE_ORDER_AND_ALIASES)¶ 
Move a tensor from one IPU to another by duplicating it, mapping the clone onto another IPU, and provide the src/dsts tensors of an interIPU copy (but to not add that copy to a program at this point).
 Return

The new tensor on the specified IPU.
 Parameters


masterGraph
: The graph representing the entire multiIPU device. 
t
: The tensor to move from one IPU to another. 
dstIPU
: The index of the IPU onto which the Tensor will be moved. 
copySrc
: A tensor that can be used as the source to do the copy 
copyDst
: A tensor that can be used as the dest to do the copy 
name
: A debug name to give to the tensor created on dstIPU. If this is empty then the debug names will be derived from existing tensor debug names. 
method
: The method to use for cloning of the tensor on the destination IPU.

 bool
dimIsSplitOverTiles
(const poplar::Graph &graph, const poplar::Tensor &t, unsigned dimension)¶ 
Check if the tile mapping of the given tensor is or isn’t such that the given dimension is split over more than 1 Tile.
 Return

true if any slice of the given dimension is spread over more than one Tile.
 Parameters


graph
: The graph to introspect. 
t
: The tensor to introspect. 
dimension
: The dimension to check.

 bool
dimIsSplitOverIPUs
(const poplar::Graph &graph, const poplar::Tensor &t, unsigned dimension)¶ 
Check if the tile mapping of the given tensor is or isn’t such that the given dimension is split over more than 1 IPU.
 Return

true if any slice of the given dimension is spread over more than one IPU.
 Parameters


graph
: The graph to introspect. 
t
: The tensor to introspect. 
dimension
: The dimension to check.

 class
TensorUseTracker
¶  #include <TileMapping.hpp>
Class that tracks the usage of data on different tiles.
If data is broadcast to many tiles, it is sometimes efficient to map the data to be spread evenly amongst the tiles that use it.
This class can collect uses of data and then calculate such a tile mapping.
Public Functions
TensorUseTracker
(unsigned numTiles)¶
TensorUseTracker
(const TensorUseTracker &other)¶
TensorUseTracker
(TensorUseTracker &&other)¶
 TensorUseTracker &
operator=
(const TensorUseTracker &other)¶
 TensorUseTracker &
operator=
(TensorUseTracker &&other)¶
~TensorUseTracker
()¶
 void
add
(const poplar::Graph &graph, unsigned tile, const poplar::Tensor &t)¶ 
Add a data use case.
 Parameters


graph
: The Poplar graph 
tile
: The tile that the use occurs on. 
t
: The tensor representing the data being used.

 void
add
(TensorUseTracker other)¶ 
Add data use cases from another tracker.
 Parameters


other
: The TensorUseTracker from which to merge data uses.

 void
resolve
(const poplar::Graph &graph, unsigned grainSize, unsigned minElementsPerTile, bool optimizeHaloRegions = false, bool extendPartialUsage = false)¶ 
Resolve data uses for mapping.
Data used on multiple tiles will have their uses spread across those tiles.
 Parameters


grainSize
: The number of elements that cannot be split amongst tiles. 
minElementsPerTile
: The minimum number of elements that must be mapped to a tile. 
optimizeHaloRegions
: Map “halo regions” to single tiles. These are regions that are used by multiple tiles but have neighbouring regions used by subsets of those tiles. 
extendPartialUsage
: When set, partial uses of tensors will be extended to cover the entire tensor, based on the usage of neighbouring regions.

 void
mapTensorsByUse
(poplar::Graph &graph, unsigned grainSize, unsigned minElementsPerTile, bool optimizeHaloRegions = false, bool extendPartialUsage = false)¶ 
Map data according to use.
This function will set the tile mapping of variable regions based on tracked data uses. Variable regions with uses on multiple tiles will have their elements spread across those tiles.
 Parameters


graph
: The Poplar graph 
grainSize
: The number of elements that cannot be split amongst tiles. 
minElementsPerTile
: The minimum number of elements that must be mapped to a tile. 
optimizeHaloRegions
: Map “halo regions” to single tiles. These are regions that are used by multiple tiles but have neighbouring regions used by subsets of those tiles. 
extendPartialUsage
: When set, partial uses of tensors will be extended to cover the entire tensor, based on the usage of neighbouring regions before mapping.

 bool
empty
() const
 std::vector<std::vector<poplar::Interval>>