Stanfun
Fun with Stan
Some notes on trying to fit mixture of multivariate Gaussian models in Stan.
Setting things up
Get CmdStan. Unzip, cd
into the cmdstan-x.xx.x
directory and run make
. This doesn’t build anything, it just prints you some info on how to compile and use CmdStan.
CmdStan is a compiler that takes a statistical model description written in the Stan language (extension .stan
) and builds a binary that you can use to fit the model on some data, diagnose the model, etc.
Model
The model itself is very simple. Given $ \mathbf{y} \in \mathbb{R}^{N \times 2} $, i.e. $ N $ observations of 2d vectors, for some $ K \in \mathbb{N} $, we define the parameters $ \pi \in \mathbb{R}^K, \{ \boldsymbol{\mu}_k \} _{k \in [K]} \in \mathbb{R}^{K \times 2}, \{ \boldsymbol{\Sigma}_k \} _{k \in [K]} \in \mathbb{R}^{K \times 2 \times 2} $ by:
$$ \begin{align*} \sum_{k=1}^K \pi_k &= 1 \\ p ( \mathbf{y}^{(i)} ) &\propto \sum_{k=1}^K p (z^{(i)} = k) p ( \mathbf{y}^{(i)} \mid \boldsymbol{\mu} _{z^{(i)}}, \boldsymbol{\Sigma} _{z^{(i)}} ) \end{align*} $$
where
$$ \begin{align*} p (z^{(i)} = k) &= \pi_k \\ \mathbf{y}^{(i)} \mid \boldsymbol{\mu} _{z^{(i)}}, \boldsymbol{\Sigma} _{z^{(i)}} &\sim \mathcal{N} ( \boldsymbol{\mu} _{z^{(i)}}, \boldsymbol{\Sigma} _{z^{(i)}} ) \end{align*} $$
Here’s the code for the model:
data {
// Number of observations
int<lower=0> N;
// Number of Gaussians
int<lower=1> K;
// Observations (2d vectors)
vector[2] y[N];
}
parameters {
// Mixing coefficients
simplex[K] pi;
// Gaussian means
vector[2] mu[K];
// Gaussian covariance matrices
cholesky_factor_cov[2] L[K];
}
model {
vector[K] log_pi = log(pi);
for (n in 1:N) {
vector[K] log_probs = log_pi;
for (k in 1:K) {
log_probs[k] += multi_normal_cholesky_lpdf(y[n] | mu[k], L[k]);
}
target += log_sum_exp(log_probs);
}
}
The data { ... }
and parameters { ... }
blocks define the (observed) data
and the (unobserved) parameters.
The model { ... }
block defines the joint log probability, which is
accumulated by the statement target += ...
. The variable log_pi
is the
component-wise log of the class probabilities $ \boldsymbol{\pi} $, and the
function multi_normal_cholesky_lpdf(y[n] | mu[k], L[k])
returns the log
density of the multivariate normal distribution $ \mathcal{N} (
\boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k ) $. You can see that this code
is actually accumulating the joint probability of all $ N $ observations:
$$ \begin{align*} p ( \mathbf{y}^{(i)} ) &\propto \sum_{k=1}^K p (z^{(i)} = k) p ( \mathbf{y}^{(i)} \mid \boldsymbol{\mu} _{z^{(i)}}, \boldsymbol{\Sigma} _{z^{(i)}} ) \\ \log p ( \mathbf{y}^{(i)} ) &= - \log Z + \log \sum _{k=1}^K p (z^{(i)} = k) p ( \mathbf{y}^{(i)} \mid \boldsymbol{\mu} _{z^{(i)}}, \boldsymbol{\Sigma} _{z^{(i)}} ) \\ &= - \log Z + \log \sum _{k=1}^K \exp \log ( p (z^{(i)} = k) p ( \mathbf{y}^{(i)} \mid \boldsymbol{\mu} _{z^{(i)}}, \boldsymbol{\Sigma} _{z^{(i)}} ) ) \\ &= - \log Z + \log \sum _{k=1}^K \exp ( \log p (z^{(i)} = k) + \log p ( \mathbf{y}^{(i)} \mid \boldsymbol{\mu} _{z^{(i)}}, \boldsymbol{\Sigma} _{z^{(i)}} ) ) \end{align*} $$
where $ Z $ is the normalizing constant (ignored in the code).
Instead of defining and using the covariance matrices $ \{
\boldsymbol{\Sigma}_k \} $ directly as covariance matrices (cov_matrix
), we
define them as Cholesky factors of covariance
matrices
(cholesky_factor_cov
). Using cholesky_factor_cov
instead of cov_matrix
apparently is more efficient. I tried using cov_matrix
but the sampler gave
me a bunch of errors, so I’m using cholesky_factor_cov
here.