Part 1: Getting R up to speed

Getting R up to speed

It’s more likely that installing fancy Bayesian things will actually work if you take a bit to update your R installation first.

R

R is the statistical programming language we’ll use for the course. (If you’re deeply attached to, say, Python, let’s talk about it!)

It sounds like most of you are familiar with R, but if you need a refresher, Kruschke Chapter 3 has one. A sociology colleague of mine, Charles Lanfear, also has some great resources online from his CSSS 508 class: https://clanfear.github.io/CSSS508/

Download R for your OS here: https://cloud.r-project.org/

You’ll want at least R version 3.4.0. The latest release is 3.5.3.

RStudio

If you don’t use it already, RStudio is a development environment that will make your R programming life happier and easier.

Install the desktop version here: https://www.rstudio.com/products/rstudio/download/

When you open RStudio, it’ll show you the R version it finds in the console. Make sure it’s the right one!

The latest stable release of RStudio is 1.1.463. However, RStudio version 1.2, which is available in a preview version, is supposed to have even better Stan support, so you might consider checking that out: https://www.rstudio.com/products/rstudio/download/preview/

RMarkdown

Rproj

A convenient way to bundle R, Rmd, jags, and stan files together is by using RStudio projects (.Rproj files):

https://r4ds.had.co.nz/workflow-projects.html#rstudio-projects

If you open the 01-setup.Rproj file after extracting your zip file, R will be able to find your data and your JAGS and Stan models using relative file paths. That’s a good thing!

Useful packages

I use the Tidyverse for my data processing and plotting needs. You don’t have to use it yourself, but I’d recommend installing it.

install.packages("tidyverse")

Part 2: Bayesian software

Bayesian software

We’ll use two additional programming languages for specifying Bayesian models and fitting them using MCMC. Each one has a “backend” that isn’t written in R, and an R package that works as an interface.

JAGS and rjags

JAGS (“Just Another Gibbs Sampler”) is one program for specifying Bayesian models and fitting them using MCMC. Chapter 8 of Kruschke describes JAGS.

First, install the appropriate binary for JAGS from the link here: http://mcmc-jags.sourceforge.net/

Then, install two R packages to interface with JAGS:

install.packages("rjags")
install.packages("runjags")

Did it work?

Load rjags and generate some data to test with:

library(rjags)

# invent some coin flip data
jags_data <- list(
  y = c(0, 1, 0, 0, 0, 1, 1, 0, 0)
)
jags_data$N <- length(jags_data$y)

A test model

jags_model <- jags.model(file = "jags/bernbeta.txt", 
                         data = jags_data, 
                         n.chains = 3, 
                         n.adapt = 500)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 9
##    Unobserved stochastic nodes: 1
##    Total graph size: 12
## 
## Initializing model

Draw from model

# burn-in
update(jags_model, n.iter = 500)

# sampling
coda_samples <- coda.samples(jags_model, variable.names =  c("theta"), 
                             n.iter = 1000)

Note: This probably isn’t enough iterations, and we aren’t checking to see if the chains are any good!

Stan and RStan

C++ toolchain

Stan is written in C++. It’ll try to install a C++ toolchain so you can compile Stan programs, but that might not work. Check to see if it did:

pkgbuild::has_build_tools(debug = TRUE)

If the check doesn’t return TRUE, you’ll need to follow the instructions linked above to get your C++ toolchain working. This is a different process for Windows, Mac, or Linux, and you may need my help with it.

Afterward, there’s a bit of configuration you should do, also in the link above.

Did it work?

Let’s load Stan and find out!

library(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

A test model

We’ll fit some fake data to a model called 8schools:

schools_dat <- list(J = 8, 
                    y = c(28,  8, -3,  7, -1,  1, 18, 12),
                    sigma = c(15, 10, 16, 11,  9, 11, 10, 18))

Fitting the model

This will take a bit to compile, then start sampling. If you see a bunch of output about chains, that means it worked!

fit <- stan(file = "stan/8schools.stan", data = schools_dat)

I actually ran into an error myself while updating to rstan 2.18.2! I had to delete everything in my Makevars file and start over to fix it.

Bernoulli model, Stan version

This is the same model as the JAGS model above, and we can use the same data structure.

NOTE: you can compile the model and sample from it separately. stan_model + sampling is the same as just calling stan.

mod_bern <- stan_model("stan/bernbeta.stan")
mod_bern
## S4 class stanmodel 'bernbeta' coded as follows:
## data {
##   int N;
##   int y[N];
## }
## parameters {
##   real<lower=0, upper=1> theta;
## }
## model {
##   // likelihood
##   y ~ bernoulli(theta);
##   // prior
##   theta ~ beta(1, 1);
## }
fit_bern <- sampling(mod_bern, data = jags_data)
print(fit_bern)
## Inference for Stan model: bernbeta.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##        mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## theta  0.36    0.00 0.14  0.12  0.26  0.36  0.46  0.65  1369    1
## lp__  -7.75    0.02 0.76 -9.81 -7.92 -7.47 -7.27 -7.21  1654    1
## 
## Samples were drawn using NUTS(diag_e) at Thu Apr 11 16:50:19 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).
stan_plot(fit_bern, show_density = TRUE)
## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)

Part 3: Extras

Extras

You’ll need some of these packages later, but you don’t have to worry about setting them up now if you don’t want to.

Bayesian workflow packages

Later in the quarter, we’ll learn about the entire Bayesian workflow—after you build and fit a model, you’ll need to check it and critique it in an iterative fashion.

Two packages will help with this:

bayesplot (http://mc-stan.org/bayesplot/) makes plots for visually checking model diagnostics, posterior distributions, and fits to data.

loo (http://mc-stan.org/loo/) approximates leave-one-out cross-validation, for model comparison and selection.

install.packages("bayesplot")
install.packages("loo")

R-style modeling

These are convenient shortcuts to common Stan models, using R syntax. I’m more familiar with rstanarm, but brms is similar and also well-regarded.

As you move into applied social science problems that you want to use Bayesian models, you’ll probably want to use these packages. However, I’d encourage you to use rstan and your own Stan models at the beginning, because that’s a better way to develop an intuition and understanding of what they’re actually doing.

install.packages("rstanarm")
install.packages("brms")

Tidybayes

If you’re all in on tidy data, tidybayes can make dealing with the output from rstan and rjags easier.

install.packages("tidybayes")

Appendix

Session Info

sessionInfo()
## R version 3.5.3 (2019-03-11)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.2
## 
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] rstan_2.18.2       StanHeaders_2.18.1 ggplot2_3.1.0     
## [4] rjags_4-8          coda_0.19-1       
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_0.2.4   purrr_0.2.5        lattice_0.20-38   
##  [4] colorspace_1.4-0   htmltools_0.3.6    stats4_3.5.3      
##  [7] base64enc_0.1-3    loo_2.1.0          yaml_2.1.19       
## [10] rlang_0.3.0.1      pkgbuild_1.0.3     pillar_1.2.3      
## [13] glue_1.3.1         withr_2.1.2        bindrcpp_0.2.2    
## [16] matrixStats_0.53.1 bindr_0.1.1        plyr_1.8.4        
## [19] stringr_1.3.1      munsell_0.5.0      gtable_0.2.0      
## [22] codetools_0.2-16   evaluate_0.10.1    labeling_0.3      
## [25] inline_0.3.15      knitr_1.20         callr_2.0.4       
## [28] parallel_3.5.3     bayesplot_1.6.0    Rcpp_1.0.0        
## [31] readr_1.1.1        KernSmooth_2.23-15 scales_1.0.0      
## [34] backports_1.1.2    mime_0.6           gridExtra_2.3     
## [37] hms_0.4.2          digest_0.6.18      stringi_1.2.3     
## [40] processx_3.1.0     dplyr_0.7.5        grid_3.5.3        
## [43] rprojroot_1.3-2    cli_1.0.0          tools_3.5.3       
## [46] magrittr_1.5       lazyeval_0.2.1     tibble_1.4.2      
## [49] crayon_1.3.4       pkgconfig_2.0.1    rsconnect_0.8.8   
## [52] prettyunits_1.0.2  ggridges_0.5.0     assertthat_0.2.0  
## [55] rmarkdown_1.10     rstudioapi_0.7     R6_2.3.0          
## [58] compiler_3.5.3