One of the most exciting open-source chatbot alternatives to ChatGPT – OpenAssistant’s OASST1 fine-tuned Pythia-12B - is now available to run for free on Paperspace, powered by Graphcore IPUs. This truly open-source model can be used commercially without restrictions.
oasst-sft-4-pythia12b is a variant of EleutherAI’s Pythia model family, fine-tuned using the Open Assistant Conversations (OASST1) dataset, a crowdsourced “human-generated, human-annotated assistant-style conversation corpus".
The OASST1 dataset consists of 161,443 messages in 35 different languages, annotated with 461,292 quality ratings, resulting in over 10,000 fully annotated conversation trees.
Running OASST1 Fine-tuned Pythia-12B inference on Paperspace
Open Assistant’s fine-tuned Pythia can easily be run on Graphcore IPUs using a Paperspace Gradient notebook. New users can try out Pythia on an IPU-POD4, with Paperspace’s six hour free trial. For a higher performance implementation, you can scale up to an IPU-POD16.
The notebook guides you through creating and configuring an inference pipeline and running the pipeline to build a turn-by-turn conversation.
Because the OpenAssistant model uses the same underlying Pythia-12B model as Dolly, we run it using the Dolly pipeline.
Let's begin by loading the inference config. We use the same configuration file as Dolly and manually modify the vocab size which is the only difference between the model graphs. A configuration suitable for your instance will automatically be selected.
Next, we want to create our inference pipeline. Here we define the maximum sequence length and maximum micro batch size. Before executing a model on IPUs it needs to be turned into an executable format by compiling it. This will happen when the pipeline is created. All input shapes must be known before compiling, so if the maximum sequence length or micro batch size is changed, the pipeline will need to be recompiled.
Selecting a longer sequence length or larger batch size will use more IPU memory. This means that increasing one may require you to decrease the other.
This cell will take approximately 18 minutes to complete, which includes downloading the model weights.
Call the oasst_pipeline
object you have just created to generate text from a prompt.
To make a chatbot, we will take user input and feed it to the model. So that the notebook can be tested automatically, we create a function similar to the Python built-in function input()
that collects input from the user and returns it, but which will return canned input from a test environment variable EXAMPLE_PROMPTS
instead if that variable is set. The variable should be set to a JSON list of strings - for example:
A chatbot conversation is built up from a number of turns of user input and the model writing a reply. As a conversation develops, the prompt should be extended turn by turn, so the model has access to the full context.
The model has been trained on a specific prompt template to represent the conversation as it is built up:
There are a some optional parameters to the pipeline call you can use to control the generation behaviour:
temperature
– Indicates whether you want more or less creative output. A value of 1.0 corresponds to the model's default behaviour. Smaller values than this accentuate the next token distribution and make the model more likely to pick a highly probable next token. A value of 0.0 means the model will always pick the most probable token. Temperatures greater than 1.0 flatten the next token distribution making more unusual next tokens more likely. Temperature must be zero or positive.
k
– Indicates that only among the highest k
probable tokens can be sampled. This is known as "top k" sampling. Set to 0 to disable top k sampling and sample from all possible tokens. The value for k
must be between a minimum of 0 and a maximum of config.model.embedding.vocab_size
which is 50,288. The default is 5.
output_length
- Sets a maximum output length in tokens. Generation normally stops when the model generates its end_key text, but can be made to stop before that by specifying this option. A value of None
disables the limit. The default is 'None'.
You can start with any user input. For instance "What other animals are similar to Alpacas?"
See image below for an example of the model in action, using a different prompt.
Remember to detach your pipeline when you are finished to free up resources:
Running OASST1 Fine-tuned Pythia on non-Paperspace IPU environments
To run the demo using IPU hardware other than in Paperspace, you need to have the Poplar SDK enabled.
Refer to the Getting Started guide for your system for details on how to enable the Poplar SDK. Also refer to the Jupyter Quick Start guide for how to set up Jupyter to be able to run this notebook on a remote IPU machine.