<aside> πŸ“Œ By Dr. Nir Regev

</aside>

<aside> πŸ“Œ Sign up to Circuit of Knowledge blog for unlimited tutorials and content

</aside>

<aside> πŸ“Œ If it’s knowledge you’re after, join our growing Slack community!

</aside>

July 5th 2024


This is a detailed tutorial on Variational Inference and the Evidence Lower Bound (ELBO), including mathematical derivations, intuitive explanations, and documented Python code to illustrate the key concepts.

Introduction

Variational Inference (VI) is a powerful technique in Bayesian machine learning used to approximate intractable posterior distributions. The main idea behind VI is to transform the complex problem of posterior inference into an optimization problem by introducing a family of simpler, tractable distributions and finding the one that best approximates the true posterior. This is achieved by maximizing a lower bound on the log evidence, known as the Evidence Lower Bound (ELBO).

Variational Inference Setup

Let's consider a generative model with latent variables $\mathbf{z}$ and observed variables $\mathbf{x}$. The joint probability distribution can be factorized as:

$$ ⁍ $$

Our goal is to infer the posterior distribution $p(\mathbf{z} | \mathbf{x})$, which represents the probability of the latent variables given the observed data. However, computing the exact posterior is often intractable due to the normalization constant:

$$ ⁍ $$

Variational Inference introduces a variational distribution $q_{\phi}(\mathbf{z}|\mathbf{x})$ to approximate the true posterior. The goal is to find the optimal variational distribution $q_{\phi}^*(\mathbf{z}|\mathbf{x})$ that minimizes the Kullback-Leibler (KL) divergence between $q_{\phi}(\mathbf{z}|\mathbf{x})$ and $p(\mathbf{z} | \mathbf{x})$:

$$ ⁍ $$

Evidence Lower Bound (ELBO)

The KL divergence between $q_{\phi}(\mathbf{z}|\mathbf{x})$ and $p(\mathbf{z} | \mathbf{x})$ can be written as:

$$ ⁍ $$

However, this expression is still intractable due to the presence of the log evidence term $\log p(\mathbf{x})$. To circumvent this issue, we can derive the Evidence Lower Bound (ELBO) by applying Jensen's inequality:

$$ ⁍ ⁍ $$