Written by Carlo Luschi
Posted Jan 23, 2018
In 2018 we are excitedly looking forward to the innovations that new hardware will make possible in machine learning. So with our IPU processor coming very soon, now is a good time to look forward to some of the most interesting areas of research that we are tracking at Graphcore. We believe that our IPU will be able to accelerate new innovations in machine learning and in deep learning, to push forward what is possible in Artificial Intelligence.
Learning to Learn/Meta-Learning
Up to now, which machine learning algorithm and what underlying model structure to use has been based on time consuming investigations and research by human experts. Often, the only way to select the best algorithms and the most appropriate model architecture, with the correct hyper-parameters, is through trial and error. Meta-Learning is a concept where algorithms and model structures are explored using machine learning methods so that the selection of the models and algorithms can be automated and applied to a specific practical objective.
For human beings, learning happens at multiple scales. Our brains are born with the ability to learn new concepts and tasks. In a similar way, with Meta-Learning, or Learning to Learn, AI algorithms are applied hierarchically. This includes first learning which is the best network architecture, and what optimization algorithms and hyper-parameters are most appropriate for the model that has been selected. The model that has been selected through this process is then used to learn the relevant task.
This direction of research has already started to achieve remarkable successes based on the use of different optimization techniques. One example is Evolutionary Strategies (Liu et al., 2017 https://arxiv.org/abs/1711.00436).
Meta-Learning can also be applied to Reinforcement Learning (RL) algorithms. With a Meta- Reinforcement Learning Algorithm, the objective is to not only learn a policy, but also to learn the entire Reinforcement Learning agent including both the Reinforcement Learning algorithm and the policy. This topic was covered well by Pieter Abbeel, at the Meta Learning Symposium held during NIPS 2017. See: http://metalearning-symposium.ml.
Humans are able to learn new concepts with very little supervision from just a few examples. In a deep learning network, we typically require a huge amount of labelled training data. Today’s deep neural networks are not able to quickly recognize a new object that they have only seen once or twice. Recent work has considered the application of model-based, or metric-based, or optimization-based Meta-Learning approaches to define network architectures that are able to learn from just a few data examples. Again this topic was covered at the NIPS Meta Learning Symposium by Oriol Vinyals, see: http://metalearning-symposium.ml.
The work on One-Shot Learning by Vinyals et al. gives significant improvements over previous baseline one-shot accuracy for video and language tasks. The approach uses a model that learns a classifier based on an attention kernel to map a small labelled support set and an unlabelled example to its corresponding label (O. Vinyals et al., 2016 https://arxiv.org/abs/1606.04080).
For Reinforcement Learning applications, One-Shot Imitation Learning is showing the possibility of learning from just a few demonstrations of a given task. It is possible to generalize to new instances of the same task by applying a Meta-Learning approach to train robust policies that can be applied to a wide variety of tasks (Y. Duan et al., 2017 https://arxiv.org/abs/1703.07326; P. Abbeel, 2017 http://metalearning-symposium.ml).
Many existing Reinforcement Learning (RL) systems already rely on simulations to explore the solution space and solve complex problems. These include systems based on Self-Play for gaming applications. Self-Play is an essential part of the algorithms used by Google DeepMind in AlphaGo and in the more recent AlphaGo Zero reinforcement learning systems. These are the breakthrough approaches that have defeated the world champion at the ancient Chinese game of Go (D. Silver et al., 2017 https://www.nature.com/articles/nature24270 , D. Silver et al., 2017 https://arxiv.org/abs/1712.01815v1). The newer AlphaGo Zero system has achieved a significant step forward compared to the original Alpha Go system. It was trained entirely by Self-Play RL starting from a completely random play, and received no human data or supervision input. The system is effectively self-learning.
Simulation for Reinforcement Learning training has also been used in Imagination Augmented RL algorithms – the recent Imagination-Augmented Agents (I2A) approach improves on the original model-based RL algorithms by combining both model-free and model-based policy rollouts. This approach allows the policy improvement operator to be learnt and this has resulted in a significant improvement in performance (T. Weber et al., 2017 https://arxiv.org/abs/1707.06203).
Today, data augmentation is commonly used to help in training of deep neural networks. This approach increases the amount of available training data and improves the robustness and generalization performance. This is achieved by applying to the training data random transformations that are known to be invariant to the specific tasks. This approach has been traditionally implemented by manually generating new examples through random transformations like translations, rotations and flips. Recent research has shown the advantages of applying Meta-Learning approaches to data augmentation. This is achieved by using a Generative Adversarial Network (or GAN) to learn a potentially much larger space of invariant transformations (A. Antoniou et al., 2017 https://arxiv.org/abs/1711.04340).
GANs have also been successfully applied to data generation for semi-supervised learning (T. Salimans et al., 2016 https://arxiv.org/abs/1606.03498).
Finding sources of labelled training data is often difficult. Sometimes the data is not available or the effort required to label data requires input from highly skilled people and as a result is too expensive and too time consuming. Deep generative models, including Variational Autoencoders (or VAEs) (Z. Hu et al., 2017 https://arxiv.org/abs/1706.00550, have shown that they have the potential to learn the features of a dataset. This approach has a wide applicability and will open up the possibility of exploiting unlabelled data, which is often much more readily accessible.
One of the areas that I am personally most excited about is Evolution Strategies (or ES). ES is a black-box optimization technique that was first introduced in the 1960s and 1970s. ES has recently been shown to provide learning performance competitive with optimization based on backpropagation. However, ES is easier to scale in a distributed setting (T. Salimans et al., 2017 https://supercomputersfordl2017.github.io/Presentations/Salimans_ES.pdf).
For Reinforcement Learning tasks, the use of Evolution Strategies allows direct policy search by injecting noise in the parameter space rather than in the action space. This corresponds to gradient estimates that can be interpreted as randomized finite differences in a high dimensional space (T. Salimans et al., 2017 https://arxiv.org/abs/1703.03864). This method has been shown to be highly parallelizable, enabling effective and robust distributed implementation over thousands of machines (https://blog.openai.com/evolution-strategies).
More recent work has explored the application of alternative approaches based on non-gradient based evolutionary algorithms. In particular, the use of simple population-based Genetic Algorithms (GA) has also achieved competitive performance on complex Deep Reinforcement Learning benchmarks. In one example Genetic Algorithms were able to successfully evolve deep neural networks with over four million parameters, using a large-scale, but practical parallel implementation (F. Petroski Such et al., 2017 https://arxiv.org/abs/1712.06567).
Looking Further Forward
As new hardware becomes available further innovations will become possible. For example, today it is very difficult and computationally expensive to use stochastic rounding in arithmetic calculations or to generate large amounts of random samples. Graphcore’s IPU will support the implementation of these techniques with far greater efficiency. We might start to see more statistical processing approaches or Bayesian methods being used and combined together with current Deep Neural Network approaches. Today’s hardware is also not efficient at operating on sparse data structures however we know that sparse networks would often be more appropriate.
Finally, as AI is applied in more and more applications, with intelligence becoming used in real-time systems like natural language, language translations and in autonomous cars, fast response and low latency will become an absolute requirement. Processors like our IPU that can operate efficiently on small batch sizes or even on a mini-batch of one will become extremely important.
The rate of progress in Machine Learning over the last few years has been phenomenal. We should expect this to continue with the rate of progress increasing as new hardware becomes available that will allow innovators to achieve the next breakthroughs in AI.
Written by Carlo Luschi
Posted Jan 23, 2018