Text summarization with BART Large on IPU

Text summarization is one of the best examples of AI natural language processing (NLP) being put to practical use.

With vast amounts of information produced every day, the ability to quickly understand, evaluate and act on that information can be extremely valuable, both in the commercial world and in other fields such as scientific research.

Summarization is a task of producing a shorter version of a document while preserving its important information. Fundamentally, it involves extracting text from the original input then generating a new text that describes the essence of the original. In some cases the two parts many be managed by different AI models.

In this blog, we will demonstrate how to run the entire summarization process using BART-Large on Graphcore IPUs.

What is BART and why is it good for text summarization?

When Google launched BERT (Bidirectional Encoder Representations from Transformers) in 2018, it was described as being a model for "language understanding" which was defined as a broad range of applications, including sentiment analysis, text classification and question answering. Summarization was not explicity called-out as a use-case, at that time.

In the same year, Open-AI further advanced the field of Natural Language Understanding, proposing the concept of Generative Pre-Training (GPT).

In late 2019, Facebook AI researchers proposed a combination of bidirectional encoder (like BERT) and left-to-right decoder (like GPT) and gave it a name BART, which stands for Bidirectional and Auto-Regressive Transformers.

According to the original paper, the novelty in pretraining involves a new in-filling scheme when randomly shuffling the order of original sentences. The authors claimed that BART is particularly effective when fine tuned for text generation and for comprehension tasks - both of which are needed for text summarization.

Text summarization on Graphcore IPUs with Hugging Face pipeline

BART is one of the many NLP models supported within Optimum Graphcore, which is an interface between Hugging Face and Graphcore IPUs.

Here we demonstrate a text summarization task running BART-Large inference on Graphcore IPUs.

BART-Large

Text Summarization on IPU

For each code block below, you can simply click to run the block in Paperspace - making any modifications to code/parameters, where relevant. We explain how to run the process in environments other than Paperspace Gradient Notebooks at the end of this blog.

Install dependencies


%pip install optimum-graphcore==0.7.1 wikipedia graphcore-cloud-tools[logger]@git+https://github.com/graphcore/graphcore-cloud-tools

%load_ext graphcore_cloud_tools.notebook_logging.gc_logger


import os

exec_cache_dir = os.getenv("POPLAR_EXECUTABLE_CACHE_DIR", "/tmp/exe_cache/")

Model preparation

We start by preparing the model. First, we define the configuration needed to run the model on the IPU. IPUConfig is a class that specifies attributes and configuration parameters to compile and put the model on the device:


from optimum.graphcore import IPUConfig

ipu_config = IPUConfig(
    layers_per_ipu=[12, 12],
    matmul_proportion=0.15,
    executable_cache_dir=exec_cache_dir,
    inference_parallelize_kwargs={
        "max_length": 150,
        "num_beams": 3,
        "use_encoder_output_buffer": True,
        "on_device_generation_steps": 16,
    }
)

Next, let's import pipeline from optimum.graphcore and create our summarization pipeline:


from optimum.graphcore import pipeline

summarizer = pipeline(
    "summarization",
    model="facebook/bart-large-cnn",
    tokenizer="facebook/bart-large-cnn",
    ipu_config=ipu_config,
    config="facebook/bart-large-cnn",
    max_input_length=1024,
    truncation=True
)

We define an input to test the model.


input_test = 'In computing, a compiler is a computer program that translates computer code written in one programming language (the source language) into another language (the target language). The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language (e.g. assembly language, object code, or machine code) to create an executable program.'
input_test

Compilation time for the 1st run: ~ 2:30


%%time
summarizer(input_test, max_length=150, num_beams=3)

Faster fairy tales

The first call to the pipeline was a bit slow, taking several seconds to provide the answer. This behaviour is due to compilation of the model which happens on the first call. On subsequent prompts it is much faster:


the_princess_and_the_pea = 'Once upon a time there was a prince who wanted to marry a princess; but she would have to be a real princess. He travelled all over the world to find one, but nowhere could he get what he wanted. There were princesses enough, but it was difficult to find out whether they were real ones. There was always something about them that was not as it should be. So he came home again and was sad, for he would have liked very much to have a real princess. One evening a terrible storm came on; there was thunder and lightning, and the rain poured down in torrents. Suddenly a knocking was heard at the city gate, and the old king went to open it. It was a princess standing out there in front of the gate. But, good gracious! what a sight the rain and the wind had made her look. The water ran down from her hair and clothes; it ran down into the toes of her shoes and out again at the heels. And yet she said that she was a real princess. Well, we\'ll soon find that out, thought the old queen. But she said nothing, went into the bed-room, took all the bedding off the bedstead, and laid a pea on the bottom; then she took twenty mattresses and laid them on the pea, and then twenty eider-down beds on top of the mattresses. On this the princess had to lie all night. In the morning she was asked how she had slept. "Oh, very badly!" said she. "I have scarcely closed my eyes all night. Heaven only knows what was in the bed, but I was lying on something hard, so that I am black and blue all over my body. It\'s horrible!" Now they knew that she was a real princess because she had felt the pea right through the twenty mattresses and the twenty eider-down beds. Nobody but a real princess could be as sensitive as that. So the prince took her for his wife, for now he knew that he had a real princess; and the pea was put in the museum, where it may still be seen, if no one has stolen it. There, that is a true story.'
the_princess_and_the_pea


%%time
summarizer(the_princess_and_the_pea, max_length=150, num_beams=3)

Summarization of Wikipedia articles

Now let's use the Wikipedia API to search for some long text that can be summarized:


import wikipedia

# TRY IT YOURSELF BY CHANGING THE PAGE TITLE BELOW
page_title = "Queen (band)"
text = wikipedia.page(page_title).content
text


%%time
summarizer(
    text,  # NOTE: the input text would be truncated to max_input_length=1024
    max_length=150,
    num_beams=3,
)

Summarization of medical health records

The summarization task may be also useful in summarising medical health records (MHR). Let's import an open source dataset with some medical samples.


from datasets import load_dataset

dataset = load_dataset("rungalileo/medical_transcription_40")
dataset

We focus on the medical report labeled as "text" and from the training dataset select a random patient ID.


import random

# RUN THIS CELL AGAIN TO SELECT ANOTHER REPORT
random_patient_id = random.randint(0, len(dataset["train"]))

exemplary_medical_report = dataset["train"][random_patient_id]["text"]
exemplary_medical_report


%%time
summarizer(exemplary_medical_report, max_length=150, num_beams=3)

Running BART-Large on IPUs in non-Paperspace environments

To run the demo using other IPU hardware, you need to have the Poplar SDK enabled and the relevant PopTorch wheels installed. Refer to the getting started guide for your system for details on how to enable the Poplar SDK and install the PopTorch wheels.