Graphcore Command Line Tools

This document describes Graphcore command line tools for accessing and controlling Graphcore IPU devices.

Introduction

The IPU driver software includes a number of software tools that provide information on the current status of the connected hardware.

These tools are provided as part of the Poplar SDK and as a standalone driver package. The tools are in the bin directory of the installation, typically gc_drivers-ubuntu-[ver]/bin. To set up your environment to use the tools, you will need to source the enable.sh script provided in that directory:

 $ source enable.sh

The commands available are listed below. Note that not all commands are appropriate for all IPU systems. For example, some commands may not be available, or may have no effect, on cloud-based systems. Refer to the Getting Started Guide for the IPU system you are using for a list of supported commands.

gc-info
Determines what IPU cards are present in the system.
gc-inventory
Lists device IDs, physical parameters and firmware version numbers.
gc-reset
Resets an IPU device after reboot.
gc-monitor
Monitors IPU activity on shared systems.
gc-exchangetest
Allows you to test the internal exchange fabric in an IPU.
gc-memorytest
Tests all the memory in an IPU, reporting any tiles that fail.
gc-links
Displays the status and connectivity of each of the IPU-Links that connect the C2 IPU-Processor cards together. See also IPU-Link channel mapping.
gc-powertest
Tests power consumption and temperature of the C2 IPU-Processor cards.
gc-hosttraffictest
Allows you to test the data transfer between the host machine and the IPUs (in both directions).
gc-iputraffictest
Allows you to test the data transfer between IPUS.
gc-docker
Allows you to use IPU devices in Docker containers.

Many of these commands reference IPU devices by their ID. The ID mapping is shown in Device IDs and Channel Map.

The use and output of each of these tools is described in the following sections.

gc-docker

This tool generates IPU device definitions for Docker, so the IPU devices can be used inside a container. It can be used either to start a container with docker run or to display the Docker command.

Start a container with IPU devices

The default action is to start a Docker container. All options and arguments specified after -- are passed directly to docker run.

Here are some examples where {docker_opts} are options passed to docker run:

 $ gc-docker -- {docker_opts} # start a container with all IPU devices
$ gc-docker --device-id {id} -- {docker_opts} # start a container with specified device
$ gc-docker --binary {docker_bin} -- {docker_opts} # specify path to the docker binary

You can specify multiple --device-id options. You can find out the available device IDs with gc-info command. If the device ID refers to a MultiIPU device, all the IPUs that are part of the MultiIPU will be available in the container.

When you select a subset of device IDs, please note that inside the container the device ID sequence starts from 0.

Show Docker command

The --echo option displays the Docker command with IPU device definitions.

An example:

 $ gc-docker --echo -- {docker_opts}  # display docker command and don't start container

Usage

Commands

-e, --echo Display docker command and don’t start a container
-h, --help Produce help message
--version Version number

Command options

-d ARG, --device-id ARG Device id
-b ARG, --binary ARG Docker binary (default: docker)
--command arg

Examples

 gc-docker -- {dockers_opts}                      # start a container with all available IPU devices
gc-docker --device-id {id} -- {docker_opts}      # start a container with specified IPU device(s)
gc-docker --binary {docker_bin} -- {docker_opts} # specify path to the docker binary
gc-docker --echo -- {docker_opts}                # display docker command and don't start container

Notes

  • This command generates device definitions for docker run command.
  • By default a container is started with ‘docker run {docker_opts}’.
  • If –echo option is defined, the docker command is displayed.
  • You can set the path to docker with –binary option.
  • Everything after ‘–’ is passed directly to docker run.

gc-exchangetest

This tool tests the internal exchange fabric inside an IPU, ensuring that data can move properly between all the tile processors and memory locations. To use it, run:

 gc-exchangetest -d [device_id]

where [device_id] is the id number returned by the gc-inventory tool.

If the test finds no issues with the device exchange, then the output looks like this:

 {
    "result": "pass"
}

If the test finds any issues with the device exchange, the output shows which tiles have failed. For example:

 {
    "tile_results": [
        {
            "12": "fail"
        },
        {
            "1000": "fail"
        },
        {
            "0": "fail"
        }
    ],
    "result": "fail"
}

Usage

Allowed options

-d ID, --device-id ID Device id
-h, --help Produce help message
-v, --verbose Verbose output
--version Version number

gc-hosttraffictest

This tool tests the data transfer between the host machine and the IPUs (in both directions). To use it, run:

 gc-hosttraffictest -d [device_id] -j

where [device_id] is the id number returned by the gc-inventory tool.

If JSON output is selected, the output will look something like this:

 {
    "tile_to_host": {
        "duration_sec": "1.8759827170000001",
        "kbytes_transferred": "12800000",
        "gbytes_transferred": "12.20703125",
        "gbps": "52.056049938545357"
    },
    "host_to_tile": {
        "duration_sec": "1.855793126",
        "kbytes_transferred": "12800000",
        "gbytes_transferred": "12.20703125",
        "gbps": "52.622379419245675"
    }
}

The output will be plain text if the -j option is not specified. For example:

 $ gc-iputraffictest --device0 5 --device1 7
 3.9062 GB in 0.038411 seconds, 813.56 Gbps, 0 errors

If an error occurs during the test, then gc-hosttraffictest will return a non-zero exit code, and output an error message to the terminal.

Usage

Allowed options

-j, --json-output Emit JSON output
-d ID, --device-id ID Device id
-n ARG, --num-tiles ARG Number of tiles: 1 to 32 (default: 32)
-m ARG, --mode ARG r|w|rw|cc (default: rw)
-p ARG, --payload-blocks ARG Number of 64 byte blocks per transfer: 2|4 (default: 4)
-i ARG, --iterations ARG Number of 4KB transfers per tile (default: 100000)
--min-ipu-host-bandwidth ARG Minimum bandwidth (Gbps) IPU to host expected - fails if not reached
--min-host-ipu-bandwidth ARG Minimum bandwidth (Gbps) host to IPU expected - fails if not reached
-v, --verbose Verbose output
-h, --help Produce help message
--version Version number
-g, --graph-streaming Use graph streaming

gc-info

This tool list detailed information about the IPU present in the hardware platform. To extract some of the information, gc-info will need to lock access to IPUs. Therefore it cannot be used for IPUs that are already in use.

Sub-commands

A number of sub-commands are available as command line options to gc-info. The most useful are listed below.

List devices

The --list-devices and --list-all-devices command options will list the IPUs in the system.

Register dump

The --tile-status command dumps the register state from the individual tiles in the IPU. There are lots of options to control what it dumps. This is useful for low-level debugging of an application or IPU fault.

Some examples:

 $ gc-info --device-id 0 --tile-status 0  # dumps out all tile registers on device 0
$ gc-info --device-id 0 --tile-status 0 --context SU # dumps out all tile registers on device 0 for the supervisor context
$ gc-info --device-id 0 --tile-status 0 --context SU --register PC # dumps out the PC for the supervisor context on tile 0
$ gc-info --device-id 0 --tile-status - # dumps out the tile status for all tiles.
$ gc-info --device-id 0 --tile-status - -c SU -r PC # Dumps out the PC for the supervisor context from every tile

There are also commands to display various SoC registers for low-level debugging; for example, --xb-status, --gsp-status, --gsp-status.

Dump tile memory

The --dump-mem command displays the contents of memory on the specified tile.

For example, the following command dumps 16 bytes of memory from address 0x42000 on tile 0:

 $ gc-info --device-id 0 --dump-mem 0 0x42000 16

Usage

Commands

-l, --list-devices List devices
-a, --list-all-devices List all devices
-t TILE_ID, --tile-status TILE_ID Tile register dump
-k, --tile-clock-speed Tile clock speed
-i, --device-info Device info
-m ARG, --dump-mem ARG {tile_num} {start_address} {size_in_bytes}
--tr-status Trunk Router status
-x, --xb-status XB status
--gsp-status GSPs status
--nlc-status NLCs status
--pci-status SoC PCI status registers
--ss-status System services registers
--ipu-arch Display IPU arch name
--ipu-count Display the number of IPUs installed
-r ARG, --register ARG Select register to print from tiles (‘-‘ is all registers) (default: -)
-c ARG, --context ARG Select register context (‘-‘ is supervisor and TDI) (default: -)
--group-output Set to group tile status output by value. Only valid with –register
--phy-summary PCI PHY summary
--phy-dump PCI PHY dump
--show-insn List the instruction at the current supervisor’s $PC, for all tiles.
-h, --help Produce help message
--version Version number

Command options

-s, --disassemble Disassemble memory dump
-d ID, --device-id ID Device id

Examples

 gc-info --list-devices
gc-info --device-id {id} --tile-status {tile_num}
gc-info --device-id {id} --tile-status {tile_num} --register {reg}
gc-info --device-id {id} --tile-status {tile_num} --context - --register {reg}
gc-info --device-id {id} --tile-status {tile_num} --context SU --register {reg}
gc-info --device-id {id} --tile-status {tile_num} --context W0 --register {reg}
gc-info --device-id {id} --tile-status {tile_num} --context TDI --register {reg}
gc-info --device-id {id} --tile-status {tile_num} --context 0 --register {reg}
gc-info --device-id {id} --device-info
gc-info --device-id {id} --dump-mem {tile_num} {start_address} {size_in_bytes} [--disassemble]
gc-info --device-id {id} --xb-status
gc-info --device-id {id} --gsp-status
gc-info --device-id {id} --gsp-status
gc-info --device-id {id} --show-insn

gc-inventory

This tool lists device IDs, physical parameters and firmware version numbers of the IPUs present in the hardware platform.

Unlike gc-info, this command will not attempt to lock access to any IPUs. This means that it an be used to gather information about a system even if some of the IPUs are in use. However, this also means it is not able to provide as much information as gc-info.

The output will be similar to this:

 "device": {
        "id": "15",
        "type": "C2",
        "Firmware Major Version": "1",
        "Firmware Minor Version": "0",
        "Firmware Patch Version": "0",
        "IPU": "1",
        "IPU version": "ipu0",
        "PCI Id": "0000:b8:00.0",
        "link speed": "8 GT\/s",
        "link width": "8",
        "physical slot": "PCIe Slot 9",
        "serial number": "0050.0004.072728"
    }
...

Usage

Allowed options

-j, --json-output Emit JSON output
-h, --help Produce help message
--version Version number

gc-iputraffictest

This tool tests the data transfer between IPUs. To use it (for example, to test data transfer between device 0 and device 1), run:

 gc-iputraffictest --device0 0 --device1 1 -j

The device numbers used are those returned by the gc-inventory tool.

The output will be plain text if the -j option is not specified. The JSON output will look something like this:

 {
    "tile_to_tile": {
        "device0": "0",
        "device1": "1",
        "duration_sec": "0.038185906999999998",
        "kbytes_transferred": "2048000",
        "gbytes_transferred": "1.953125",
        "gbps": "409.18237191537708",
        "errors": "0"
    }
}

If an error occurs during the test, then gc-iputraffictest will return a non-zero exit code, and output an error message.

You can use this tool to run a ‘soak test’ of all the IPU-to-IPU links by running:

 $ gc-iputraffictest --all-links

There is also a --forever option that will run either a point-to-point or all-link soak forever. The -j option is not available when --forever is used.

Important note: The device0 and device1 arguments must be IDs for single IPUs, not groups of IPUs. (The test will connect to the smallest group containing both the sender and receiver.)

Usage

Allowed options

-j, --json-output Emit JSON output
--device0 ARG First IPU of exchange
--device1 ARG Second IPU of exchange
-i ARG, --iterations ARG Number of (64)KB transfers per tile (default: 1000)
-v, --verbose Verbose output
--all-links Exercise every link at once
--min-bandwidth ARG Minimum bandwidth (Gbps) expected - fails if not reached
--max-errcount ARG Maximum error count expected - fails if exceeded
--forever Transmit data forever
-h, --help Produce help message
--version Version number

gc-memorytest

This tool tests all the memory in an IPU, reporting any tiles that fail. To use it, run:

 gc-memorytest -d [device_id]

where [device_id] is the id number returned by the gc-inventory tool. If the test reveals no issues with the device memory, the output will look like this:

 {
  "result": "pass"
}

If the test reveals any issues with the device memory, the output shows which tiles have failed, and looks like this:

 {
  "tile_results": [
    {
        "12": "fail"
    },
    {
        "1000": "fail"
    },
    {
        "0": "fail"
    }
  ],
  "result": "fail"
}

Usage

Allowed options

-d ID, --device-id ID Device id
-v, --verbose Verbose output
--vddcheck Check that the IPU works at different VDD levels
-h, --help Produce help message
--version Version number

gc-monitor

You can use this command to monitor IPU activity without affecting users of the IPUs. This can be used to:

  • Check / monitor what’s currently running on which IPU in shared systems.
  • Make sure code is actually correctly running on an IPU.
  • Monitor performance: the power and temp will increase, and the clock rate will drop when an IPU is heavily loaded.

To get a continuously updated display, use it with the watch command:

 $ watch gc-monitor

The output shows information about the IPU cards in the system and information on the processes running on the machine.

The card information table shows:

  1. Physical PCIe slot location, if available.
  2. Serial number of the board.
  3. ICU firmware revision.
  4. Installed kernel module driver version number.
  5. IPU card type.
  6. PCIe width and speed.
  7. ID of the IPU as used by other tools to address the IPUs.
  8. Which IPUs are on the same card.
  9. PCIe ID.
  10. IPU number (which IPU on a card it is).

The process information displayed includes:

  1. The PID using the IPU.
  2. The process name using the IPU.
  3. The username of the user using the IPU.
  4. The ID of the IPU in use.
  5. The current measured IPU clock rate.
  6. The current average IPU temperature.
  7. The current average board temperature.
  8. The current average board power consumption.

Typical output from the is shown below.

 +---------------+-----------------------------------------------------------+
| gc-monitor    | Installed driver: 1.0.27                                  |
+------+--------+--------+------+-------+-------+----+--------------+-------+
| Slot | Serial | ICU FW | Type | Speed | Width | ID | PCIe ID      | IPU # |
+------+--------+--------+------+-------+-------+----+--------------+-------+
|6     |0174.   |1.0.26  |C2    |8 GT/s |8      |0   |0000:8e:00.0  |0      |
|      |0004.   |        |      |       |       +---------------------------+
|      |919052  |        |      |       |       |1   |0000:8b:00.0  |1      |
+------+--------+--------+------+-------+-------+----+--------------+-------+
+----------------------------------+----------------------+-----------------+
| Attached processes               | IPU                  | Board           |
|------+----------------+----------|----------------------|-----------------|
| PID  | Command        | User     | ID | Clock  | Temp   | Temp   | Power  |
+------+----------------+----------+----+--------+--------+--------+--------+
|55988 |gc-powertest    |daves     |0   |1600MHz |27.8 C  |30.0 C  |110.6 W |
+------+----------------+----------+----+--------+--------+--------+--------+

By default, the temperature and power data are not available. To enable these, the process running on the IPU must be launched with the GCDA_MONITOR environment variable set. For example, to add the temperature and power data to the output when monitoring gc-powertest, the following commands could be used:

 $ GCDA_MONITOR=1 gc-powertest -d 0
$ watch gc-monitor

Usage

Allowed options

--no-card-info Don’t display card information
-j, --json-output Emit JSON output
--version Version number
-h, --help Produce help message

Notes

  • By default, gc-monitor shows the processes running on attached IPUs, the users running them, the processes’ PIDs and the IPUs’ IDs & clock speeds.

  • Also by default, the temperature and power data is not available. To enable it for a specific process running on the IPU, that process must be launched with the GCDA_MONITOR environment variable set to 1. For example, to add the temperature and power data to the output when monitoring a gc-powertest process, the following commands could be used:

    $ GCDA_MONITOR=1 gc-powertest -d 0 $ watch gc-monitor

Examples

 $ GCDA_MONITOR=1 python main.py # Run main.py, enabling gc-monitor power
and temp output for the IPUs it uses
$ watch -n1 gc-monitor          # Continually monitor IPUs every
second, visually
$ gc-monitor -j >> data.json    # Append one JSON gc-monitor reading
to a data file

gc-powertest

This tool is used to test power consumption and temperature of the C2 IPU-Processor cards.

Note that:

  • The program runs indefinitely until killed (with Ctrl-C).
  • It can be run multiple times to start code on other IPUs/C2 cards.
  • Temperatures are reported in degrees Celsius.
  • After running gc-powertest it’s a good idea to reset the IPUs (using the gc-reset tool) otherwise they’ll be left running at high power.

To use this tool, run:

 gc-powertest -d [device_id]

where [device_id] is the id number returned by the gc-inventory tool.

Some typical output is shown below (split into three separate columns here). It is constantly updated until the process is killed. The first column shows power and temperature readings taken from the C2 card itself:

 |                            CARD LEVEL
----------------------------------------------
|                   INA3221                  |
----------------------------------------------
| 0001: 017.57W  027.84W  046.08W  T091.49W  |
| 0002: 017.66W  032.16W  052.42W  T102.24W  |
| 0003: 017.66W  032.16W  052.42W  T102.24W  |
| 0004: 017.66W  032.26W  052.32W  T102.24W  |

The second column shows power and temperature readings taken from the C2 card itself:

   CARD LEVEL                          |
---------------------------------------
| B_IN     B_OUT    B_COL0   B_COL1   |
---------------------------------------
| 037C     040C     039C     040C     |
| 036C     040C     039C     039C     |
| 036C     040C     039C     039C     |
| 035C     040C     039C     039C     |

The third column shows temperature readings measured on each of the IPUs:

 |                  IPU1                 |
-----------------------------------------
| 0:PVT0    0:PVT1    0:PVT2    0:PVTE  |
-----------------------------------------
| 036.7C    037.4C    037.6C    036.0C  |
| 036.7C    036.5C    036.5C    036.0C  |
| 036.2C    036.5C    035.5C    035.3C  |
| 036.9C    036.5C    037.2C    035.3C  |

By default, the power test will run with 0% IPU processing load. 20% or 75% loads can be requested with the -p option, for example:

 gc-powertest -d [device_id] -p 20

Usage

Allowed options

-p ARG, --percent-load ARG IPU load percentage 20|50|100
-d ID, --device-id ID Device id
-b BINARY, --binary BINARY Use custom IPU binary file
-v, --verbose Verbose mode
-j, --json-output Emit JSON output
-t TIME, --time TIME Time in seconds of the measurements (0 is infinite) (default: 0)
-h, --help Produce help message
--version Version number

gc-reset

This tool resets the IPU devices. For example:

 gc-reset -d [device_id]

Where [device_id] is the id number returned by the gc-inventory tool. This can refer to one IPU or a group of IPUs.

Usage

Allowed options

-t, --teardown-links Teardown IPU links
-m, --reset-memory Reset tile memory and registers (for debugging)
-d ID, --device-id ID Device id
-h, --help Produce help message
--version Version number

Examples

 gc-reset           # reset all IPUs
gc-reset -d 0      # reset a single IPU
gc-reset -d 30 -t  # tear down a link
gc-reset -d 0 -m   # reinitialize memory and registers

Notes

  • When tearing down links, device ID is the MultiIPU group you want to untrain the links for. Typically you would want to use the biggest MultiIPU group.

Device IDs and Channel Map

Device IDs

The diagram below shows how single IPUs and then, hierarchically, groups of IPUs are numbered in a system. The letters A to J represent individuals IPUs, the numbers are the IDs used by tools such as gc-info or gc-exchangetest.

https://www.graphcore.ai/hubfs/public_docs/_images/ipu-numbering-scheme-541.png

IPU device IDs

PCIe ID to slot mapping

The GC driver tools typically deduce which IPU cards are in which PCIe slot via a table in the SMBIOS. Alternatively a JSON configuration file may be provided by setting the environment variable GCDA_CONFIG_FILE. The file must have the following format:

 {
  "ipu_card_mapping": [
   "0000:0f:00.0",
   "0000:0e:00.0",
   "0000:0d:00.0",
   "0000:0c:00.0",
   "0000:0b:00.0",
   "0000:0a:00.0",
   "0000:09:00.0",
   "0000:08:00.0",
   "0000:07:00.0",
   "0000:06:00.0",
   "0000:05:00.0",
   "0000:04:00.0",
   "0000:03:00.0",
   "0000:02:00.0",
   "0000:01:00.0",
   "0000:00:00.0"
  ]
}

Where ipu_card_mapping is an ordered list of the PCIe IDs for the Graphcore cards, from one side of the chassis to the next. Each pair of IDs should match a single card, and the next pair should be in the next physical slot.