Stanfun

Posted on Apr 8, 2022

Fun with Stan

Some notes on trying to fit mixture of multivariate Gaussian models in Stan.

Setting things up

Get CmdStan. Unzip, cd into the cmdstan-x.xx.x directory and run make. This doesn’t build anything, it just prints you some info on how to compile and use CmdStan.

CmdStan is a compiler that takes a statistical model description written in the Stan language (extension .stan) and builds a binary that you can use to fit the model on some data, diagnose the model, etc.

Model

The model itself is very simple. Given $ \mathbf{y} \in \mathbb{R}^{N \times 2} $, i.e. $ N $ observations of 2d vectors, for some $ K \in \mathbb{N} $, we define the parameters $ \pi \in \mathbb{R}^K, \{ \boldsymbol{\mu}_k \} _{k \in [K]} \in \mathbb{R}^{K \times 2}, \{ \boldsymbol{\Sigma}_k \} _{k \in [K]} \in \mathbb{R}^{K \times 2 \times 2} $ by:

$$ \begin{align*} \sum_{k=1}^K \pi_k &= 1 \\ p ( \mathbf{y}^{(i)} ) &\propto \sum_{k=1}^K p (z^{(i)} = k) p ( \mathbf{y}^{(i)} \mid \boldsymbol{\mu} _{z^{(i)}}, \boldsymbol{\Sigma} _{z^{(i)}} ) \end{align*} $$

where

$$ \begin{align*} p (z^{(i)} = k) &= \pi_k \\ \mathbf{y}^{(i)} \mid \boldsymbol{\mu} _{z^{(i)}}, \boldsymbol{\Sigma} _{z^{(i)}} &\sim \mathcal{N} ( \boldsymbol{\mu} _{z^{(i)}}, \boldsymbol{\Sigma} _{z^{(i)}} ) \end{align*} $$

Here’s the code for the model:

data {
    //  Number of observations
    int<lower=0> N;

    //  Number of Gaussians
    int<lower=1> K;

    //  Observations (2d vectors)
    vector[2] y[N];
}

parameters {
    //  Mixing coefficients
    simplex[K] pi;

    //  Gaussian means
    vector[2] mu[K];

    //  Gaussian covariance matrices
    cholesky_factor_cov[2] L[K];
}

model {
    vector[K] log_pi = log(pi);

    for (n in 1:N) {
        vector[K] log_probs = log_pi;
        for (k in 1:K) {
            log_probs[k] += multi_normal_cholesky_lpdf(y[n] | mu[k], L[k]);
        }
        target += log_sum_exp(log_probs);
    }
}

The data { ... } and parameters { ... } blocks define the (observed) data and the (unobserved) parameters.

The model { ... } block defines the joint log probability, which is accumulated by the statement target += .... The variable log_pi is the component-wise log of the class probabilities $ \boldsymbol{\pi} $, and the function multi_normal_cholesky_lpdf(y[n] | mu[k], L[k]) returns the log density of the multivariate normal distribution $ \mathcal{N} ( \boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k ) $. You can see that this code is actually accumulating the joint probability of all $ N $ observations:

$$ \begin{align*} p ( \mathbf{y}^{(i)} ) &\propto \sum_{k=1}^K p (z^{(i)} = k) p ( \mathbf{y}^{(i)} \mid \boldsymbol{\mu} _{z^{(i)}}, \boldsymbol{\Sigma} _{z^{(i)}} ) \\ \log p ( \mathbf{y}^{(i)} ) &= - \log Z + \log \sum _{k=1}^K p (z^{(i)} = k) p ( \mathbf{y}^{(i)} \mid \boldsymbol{\mu} _{z^{(i)}}, \boldsymbol{\Sigma} _{z^{(i)}} ) \\ &= - \log Z + \log \sum _{k=1}^K \exp \log ( p (z^{(i)} = k) p ( \mathbf{y}^{(i)} \mid \boldsymbol{\mu} _{z^{(i)}}, \boldsymbol{\Sigma} _{z^{(i)}} ) ) \\ &= - \log Z + \log \sum _{k=1}^K \exp ( \log p (z^{(i)} = k) + \log p ( \mathbf{y}^{(i)} \mid \boldsymbol{\mu} _{z^{(i)}}, \boldsymbol{\Sigma} _{z^{(i)}} ) ) \end{align*} $$

where $ Z $ is the normalizing constant (ignored in the code).

Instead of defining and using the covariance matrices $ \{ \boldsymbol{\Sigma}_k \} $ directly as covariance matrices (cov_matrix), we define them as Cholesky factors of covariance matrices (cholesky_factor_cov). Using cholesky_factor_cov instead of cov_matrix apparently is more efficient. I tried using cov_matrix but the sampler gave me a bunch of errors, so I’m using cholesky_factor_cov here.