Goals

understand the relationship between priors, likelihoods, and posteriors
approximate a simple posterior using grid approximation
analytically calculate that posterior using conjugacy

If you’ve opened the accompanying R file, please close it now :)

Example: coin flips

Our main example in this lab will be a collection of coin flips, modeled using the binomial distribution:

\[y \sim \text{Binomial}(N, \theta)\]

y is the number of heads, N is the number of flips, and \(\theta\) is the probability of heads. We’d like to estimate \(\theta\). Kruschke Chapter 6 is a good reference for the math here (though his example uses the Bernoulli distribution).

Prior: beta distribution

A good prior distribution for \(\theta\) in the binomial distribution is a beta distribution. One reason is that the beta distribution is continuous bounded between 0 and 1, like \(\theta\) should be. There’s another reason, which we’ll introduce below.

# hyperparameters
a_prior <- 1
b_prior <- 1

# plot prior from dbeta

Likelihood: binomial distribution

Let’s say we flip the coin 10 times and get 5 heads:

# data
trials <- 10
successes <- 5

# plot likelihood from dbinom

Posterior: ???

Grid approximation

For simple problems, one way to get to a posterior is to approximate the shape of it with a grid.

# grid approximation ----

# set up a probability grid

# generate prior values at points on the grid

# generate likelihood values at points on the grid

# multiply the likelihood by the prior at each point on the grid

# standardize the posterior to sum to 1

# plot the standardized posterior

You can sample from the grid (with replacement!) using the posterior probabilities to compute summary quantities.

# sample from posterior
set.seed(20190425)
nsims <- 1e4

Stop! Don’t look below until we’ve done the grid approximation.

Analytic solution

The problem above turns out to be something we can solve with math. We can use math because the beta distribution is the conjugate prior for the binomial likelihood.

What does that mean? A prior and likelihood are conjugate distributions if the posterior distribution comes from the same family as the prior.

In this case, the posterior distribution is also a beta distribution!

Remember the parameter values for our prior:

# analytic solution ----
a_prior <- 1
b_prior <- 1

We can think of the prior in terms of previously observed data: 1 head, 1 tail.

When we see new data, we update those values accordingly. These were our new data:

trials <- 10
successes <- 5

We add the number of successes to a, and the number of failures to b:

failures <- trials - successes

a_posterior <- a_prior + successes 
b_posterior <- b_prior + failures

Then, we can plot the new beta distribution:

# plot the analytic posterior distribution

Updating: posterior as prior

What happens if we do a second set of coin flips? We can take our previous data into account by starting with our old posterior as a new prior.

# make some new data

# update the parameters

# plot the new posterior

MCMC (optional)

The jags/ and stan/ folders in this project contain models you’ve seen before. These are models for a Bernoulli distribution: repeated observations of a single coin flip, not a collection of coin flips.

What we want to do is make changes to those models so we can use them on the binomial data introduced above (N = 10, y = 5). Ultimately, the following code should work:

library(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

m_binomial <- stan_model("stan/binomial.stan")

d <- list(
  N = 10, 
  y = 5, 
  a = 1, 
  b = 1
)

fit_binomial <- sampling(m_binomial, data = d)

Weakly informative priors (optional)

Think back to Problem 4 of Assignment 1: Flat priors aren’t really uninformative. Can we use the logit transformation to assign a weakly informative prior somehow? How might we go about doing that?

Here’s a hint: what’s different about the binomial and binomial_logit functions in Stan?

rstan::lookup("^binomial")

##                StanFunction                      Arguments ReturnType Page
## 52             binomial_cdf  (ints n, ints N, reals theta)       real   97
## 53 binomial_coefficient_log               (real x, real y)       real   35
## 54           binomial_lccdf (ints n | ints N, reals theta)       real   98
## 55            binomial_lcdf (ints n | ints N, reals theta)       real   98
## 56      binomial_logit_lpmf (ints n | ints N, reals alpha)       real   99
## 57           binomial_logit                              ~       real   99
## 58            binomial_lpmf (ints n | ints N, reals theta)       real   97
## 59                 binomial                              ~       real   97
## 60             binomial_rng          (ints N, reals theta)          R   98

On your own: the normal distribution

If they’re so convenient, why don’t we use analytic solutions and conjugate priors all the time? To answer that question, think about the normal distribution.

Remember, the normal distribution can be parameterized in 3 ways:

With the standard deviation (aka scale):

\[y \sim \text{Normal}(\mu, \sigma)\]

With the variance:

\[y \sim \text{Normal}(\mu, \sigma^2)\]

With the precision, \(\tau = \frac{1}{\sigma^2}\):

\[y \sim \text{Normal}(\mu, \tau)\]

The key difference from the binomial or Bernoulli distribution is that the normal distribution has 2 parameters we might want to estimate.

If you assume the scale/variance/precision is known, then the only parameter you have to worry about is the mean. In that case, the conjugate prior is also a normal distribution! You can think about the posterior mean as a weighted average of the mean of the data and the prior mean. (Weighted how? Using the precisions of the prior and the sample!)

See this page for different conjugate priors for the normal distribution, depending on which parameters you hold fixed or let vary: https://en.wikipedia.org/wiki/Conjugate_prior#When_likelihood_function_is_a_continuous_distribution

Appendix

The complete code for this lab is contained in 03-priors.R. Try to do the activities first before you look at it!

sessionInfo()

## R version 3.5.3 (2019-03-11)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.2
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.0         compiler_3.5.3     pillar_1.2.3      
##  [4] plyr_1.8.4         bindr_0.1.1        prettyunits_1.0.2 
##  [7] tools_3.5.3        digest_0.6.18      pkgbuild_1.0.3    
## [10] evaluate_0.10.1    tibble_1.4.2       gtable_0.2.0      
## [13] pkgconfig_2.0.1    rlang_0.3.0.1      cli_1.0.0         
## [16] parallel_3.5.3     yaml_2.1.19        loo_2.1.0         
## [19] bindrcpp_0.2.2     gridExtra_2.3      stringr_1.3.1     
## [22] dplyr_0.7.5        knitr_1.20         tidyselect_0.2.4  
## [25] stats4_3.5.3       rprojroot_1.3-2    grid_3.5.3        
## [28] glue_1.3.1         inline_0.3.15      R6_2.3.0          
## [31] processx_3.1.0     rmarkdown_1.10     rstan_2.18.2      
## [34] purrr_0.2.5        callr_2.0.4        ggplot2_3.1.0     
## [37] magrittr_1.5       matrixStats_0.53.1 backports_1.1.2   
## [40] scales_1.0.0       StanHeaders_2.18.1 htmltools_0.3.6   
## [43] assertthat_0.2.0   colorspace_1.4-0   stringi_1.2.3     
## [46] lazyeval_0.2.1     munsell_0.5.0      crayon_1.3.4

Priors and conjugacy

Lab 3, CSSS 564

Connor Gilroy