This vignette demonstrates how to write a Stan program that computes
and stores the pointwise log-likelihood required for using the
**loo** package. The other vignettes included with the
package demonstrate additional functionality.

Some sections from this vignette are excerpted from our papers

Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC.

*Statistics and Computing*. 27(5), 1413–1432. :10.1007/s11222-016-9696-4. Links: published | arXiv preprint.Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling.

*Journal of Machine Learning Research*, 25(72):1-58. PDF

which provide important background for understanding the methods implemented in the package.

This example comes from a survey of residents from a small area in Bangladesh that was affected by arsenic in drinking water. Respondents with elevated arsenic levels in their wells were asked if they were interested in getting water from a neighbor’s well, and a series of logistic regressions were fit to predict this binary response given various information about the households (Gelman and Hill, 2007). Here we fit a model for the well-switching response given two predictors: the arsenic level of the water in the resident’s home, and the distance of the house from the nearest safe well.

The sample size in this example is \(N=3020\), which is not huge but is large enough that it is important to have a computational method for LOO that is fast for each data point. On the plus side, with such a large dataset, the influence of any given observation is small, and so the computations should be stable.

Here is the Stan code for fitting the logistic regression model,
which we save in a file called `logistic.stan`

:

```
// Note: some syntax used in this program requires RStan >= 2.26 (or CmdStanR)
// To use an older version of RStan change the line declaring `y` to:
// int<lower=0,upper=1> y[N];
data {
int<lower=0> N; // number of data points
int<lower=0> P; // number of predictors (including intercept)
matrix[N,P] X; // predictors (including 1s for intercept)
array[N] int<lower=0,upper=1> y; // binary outcome
}
parameters {
vector[P] beta;
}
model {
beta ~ normal(0, 1);
y ~ bernoulli_logit(X * beta);
}
generated quantities {
vector[N] log_lik;
for (n in 1:N) {
log_lik[n] = bernoulli_logit_lpmf(y[n] | X[n] * beta);
}
}
```

We have defined the log likelihood as a vector named
`log_lik`

in the generated quantities block so that the
individual terms will be saved by Stan. After running Stan,
`log_lik`

can be extracted (using the
`extract_log_lik`

function provided in the
**loo** package) as an \(S \times
N\) matrix, where \(S\) is the
number of simulations (posterior draws) and \(N\) is the number of data points.

Next we fit the model in Stan using the **rstan**
package:

```
library("rstan")
# Prepare data
url <- "http://stat.columbia.edu/~gelman/arm/examples/arsenic/wells.dat"
wells <- read.table(url)
wells$dist100 <- with(wells, dist / 100)
X <- model.matrix(~ dist100 + arsenic, wells)
standata <- list(y = wells$switch, X = X, N = nrow(X), P = ncol(X))
# Fit model
fit_1 <- stan("logistic.stan", data = standata)
print(fit_1, pars = "beta")
```

```
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta[1] 0.00 0 0.08 -0.16 -0.05 0.00 0.05 0.15 1964 1
beta[2] -0.89 0 0.10 -1.09 -0.96 -0.89 -0.82 -0.68 2048 1
beta[3] 0.46 0 0.04 0.38 0.43 0.46 0.49 0.54 2198 1
```

We can then use the **loo** package to compute the
efficient PSIS-LOO approximation to exact LOO-CV:

```
library("loo")
# Extract pointwise log-likelihood
# using merge_chains=FALSE returns an array, which is easier to
# use with relative_eff()
log_lik_1 <- extract_log_lik(fit_1, merge_chains = FALSE)
# as of loo v2.0.0 we can optionally provide relative effective sample sizes
# when calling loo, which allows for better estimates of the PSIS effective
# sample sizes and Monte Carlo error
r_eff <- relative_eff(exp(log_lik_1), cores = 2)
# preferably use more than 2 cores (as many cores as possible)
# will use value of 'mc.cores' option if cores is not specified
loo_1 <- loo(log_lik_1, r_eff = r_eff, cores = 2)
print(loo_1)
```

```
Computed from 4000 by 3020 log-likelihood matrix
Estimate SE
elpd_loo -1968.5 15.6
p_loo 3.2 0.1
looic 3937.0 31.2
------
Monte Carlo SE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.5, 1.3]).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
```

The printed output from the `loo`

function shows the
estimates \(\widehat{\mbox{elpd}}_{\rm
loo}\) (expected log predictive density), \(\widehat{p}_{\rm loo}\) (effective number
of parameters), and \({\rm looic} =-2\,
\widehat{\mbox{elpd}}_{\rm loo}\) (the LOO information
criterion).

The line at the bottom of the printed output provides information
about the reliability of the LOO approximation (the interpretation of
the \(k\) parameter is explained in
`help('pareto-k-diagnostic')`

and in greater detail in
Vehtari, Simpson, Gelman, Yao, and Gabry (2019)). In this case the
message tells us that all of the estimates for \(k\) are fine.

To compare this model to an alternative model for the same data we
can use the `loo_compare`

function in the
**loo** package. First we’ll fit a second model to the
well-switching data, using `log(arsenic)`

instead of
`arsenic`

as a predictor:

```
standata$X[, "arsenic"] <- log(standata$X[, "arsenic"])
fit_2 <- stan(fit = fit_1, data = standata)
log_lik_2 <- extract_log_lik(fit_2, merge_chains = FALSE)
r_eff_2 <- relative_eff(exp(log_lik_2))
loo_2 <- loo(log_lik_2, r_eff = r_eff_2, cores = 2)
print(loo_2)
```

```
Computed from 4000 by 3020 log-likelihood matrix
Estimate SE
elpd_loo -1952.3 16.2
p_loo 3.1 0.1
looic 3904.6 32.4
------
Monte Carlo SE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 1.2]).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
```

We can now compare the models on LOO using the
`loo_compare`

function:

This new object, `comp`

, contains the estimated difference
of expected leave-one-out prediction errors between the two models,
along with the standard error:

```
elpd_diff se_diff
model2 0.0 0.0
model1 -16.3 4.4
```

The first column shows the difference in ELPD relative to the model
with the largest ELPD. In this case, the difference in `elpd`

and its scale relative to the approximate standard error of the
difference) indicates a preference for the second model
(`model2`

).

Gelman, A., and Hill, J. (2007). *Data Analysis Using Regression
and Multilevel Hierarchical Models.* Cambridge University Press.

Stan Development Team (2017). *The Stan C++ Library, Version
2.17.0.* https://mc-stan.org/

Stan Development Team (2018) *RStan: the R interface to Stan,
Version 2.17.3.* https://mc-stan.org/

Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian
model evaluation using leave-one-out cross-validation and WAIC.
*Statistics and Computing*. 27(5), 1413–1432.
:10.1007/s11222-016-9696-4. online,
arXiv preprint
arXiv:1507.04544.

Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024).
Pareto smoothed importance sampling. *Journal of Machine Learning
Research*, 25(72):1-58. PDF