Gaussian Processes
Mutual Information Mutual Information
61.3K subscribers
109,419 views
0

 Published On Aug 22, 2021

The machine learning consultancy: https://truetheta.io

For Machine Learning, Gaussian Processes enable flexible models with the richest output you could ask for - an entire predictive distribution (rather than a single number). In this video, I break down what they are, how they work and how to model with them. My intention is this will help you join the large group of people successfully applying GPs to real world problems.

SOCIAL MEDIA

LinkedIn :   / dj-rich-90b91753  
Twitter :   / duanejrich  

Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon:   / mutualinformation  

SOURCES

Chapter 17 from [2] is the most significance reference for this video. That's where I discovered the Bayesian Linear Regression to GP generalization, the list of valid ways to adjust a kernel and the Empirical Bayes approach to hyperparameter optimization. Also, it's where I get most of the notation. (In fact, for all my videos, Kevin Murphy's notation is what I follow most closely.)

[1] is a very thorough practical and theoretical analysis of GPs. When I first modeled with GPs, this book was a frequent reference. It offers a lot of practical advice for designing kernels, hyperparameter optimization and interpreting results.

[5] offers a useful tutorial on how to design kernels. I attribute this source for my intuitive understanding of how to combine kernels.

Neil's talks ([4]) on GPs were also influential. They've helped me develop much of my intuition on how GPs work.

[3] is an beautiful tutorial on GPs. I'd recommend it to anyone learning about GPs for the first time.
---------------------------
[1] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning. MIT Press, 2006.

[2] K. P. Murphy. Probabilistic Machine Learning (Second Edition), MIT Press, 2021

[3] J. Görtler, et al., "A Visual Exploration of Gaussian Processes", Distill, 2019. https://distill.pub/2019/visual-explo...

[4] N. Lawrence, Gaussian Processes talks on MLSS Africa,    • Neil Lawrence - Gaussian Processes Pa...  ,    • Neil Lawrence Gaussian Processes Part 2  

[5] D. K. Duvenaud, The Kernel Cookbook: Advice on Covariance Functions, University of Cambridge, https://www.cs.toronto.edu/~duvenaud/...

[6] K. Weinberger, "Gaussian Processes", Cornell University,    • Machine Learning Lecture 26 "Gaussian...   and    • Machine Learning Lecture 27 "Gaussian...  

RESOURCES

GPyTorch provides an extensive suite of PyTorch based tools for GP modeling. They have efficient handling of tensors, fast variance calculations, multi-task learning tools, integrations with Pyro, and Deep Kernel Learning, among other things. Exploring this as a toolset is a great way to become a competent GP modeler. Link : https://gpytorch.ai/

Also, I'd recommend source [5] for getting familiar with how to model with GPs. Understanding the kernel space to function space relationship takes time, but it takes less with this guide. Also, it links to Duvenaud's PhD Thesis, which is a very deep dive on the subject (though don't ask me about it - I didn't read it!).

EXTRA

Why is it OK to act as though a sample from a multiplied kernel comes from multiplying the function samples from the two component kernels?

The problem comes from the fact that if x1 is a sample from a Multivariate Normal with mean zero and covariance matrix S1 and the same is true for x2 and S2, then the element-wise product x1*x2 is not distributed as a multivariate Normal. However, whatever distribution x1*x2 has, it still has a covariance of S1*S2 (I've verified this experimentally). That means it wiggles similarly to a sample from the product kernel.

The background here is, I accidentally thought it was true for quite a while and it was helpful for modeling. I certainly could never tell it wasn't true. When creating this video, I discovered it wasn't in fact true, but merely a useful approximation.

Wallpaper: https://github.com/Duane321/mutual_in...

Timestamps
0:00 Pros of GPs
1:06 Bayesian Linear Regression to GPs
3:52 Controlling the GP
7:31 Modeling by Combining Kernels
8:52 Modeling Example
11:55 The Math behind GPs
18:42 Hyperparameter Selection
21:58 Cons of GPs
22:58 Resourcing for Learning More

show more

Share/Embed