The functions described on this page are used to specify the prior-related #' arguments of the various modeling functions in the rstap package (to
view the priors used for an existing model see
The default priors used in the various rstap modeling functions are
intended to be weakly informative in that they provide moderate
regularlization and help stabilize computation. For many applications the
defaults will perform well, but prudent use of more informative priors is
encouraged. All of the priors here are informed by the priors in rstanarm, though it should be noted that the heirarchical shape and lkj priors are not included.
normal(location = 0, scale = NULL, autoscale = TRUE) student_t(df = 1, location = 0, scale = NULL, autoscale = TRUE) cauchy(location = 0, scale = NULL, autoscale = TRUE) laplace(location = 0, scale = NULL, autoscale = TRUE) lasso(df = 1, location = 0, scale = NULL, autoscale = TRUE) product_normal(df = 2, location = 0, scale = 1) exponential(rate = 1, autoscale = TRUE) log_normal(location = 0, scale = 1) decov(regularization = 1, concentration = 1, shape = 1, scale = 1)
Prior location. In most cases, this is the prior mean, but
Prior scale. The default depends on the family (see Details).
A logical scalar, defaulting to
Prior degrees of freedom. The default is \(1\) for
Prior rate for the exponential distribution. Defaults to
Exponent for an LKJ prior on the correlation matrix in
Concentration parameter for a symmetric Dirichlet distribution. The default is \(1\), implying a joint uniform prior.
Shape parameter for a gamma prior on the scale parameter in the
A named list to be used internally by the rstap model fitting functions.
The details depend on the family of the prior being used:
student_t(df, location, scale)
Each of these functions also takes an argument
autoscale which is relevant
if used for any of the non-stap related parameters. It is not used otherwise.
For the prior distribution for the intercept,
df should be scalars. As the
degrees of freedom approaches infinity, the Student t distribution
approaches the normal distribution and if the degrees of freedom are one,
then the Student t distribution is the Cauchy distribution.
scale is not specified it will default to \(10\) for the
intercept and \(2.5\) for the other coefficients, unless the probit link
function is used, in which case these defaults are scaled by a factor of
dnorm(0)/dlogis(0), which is roughly \(1.6\).
autoscale argument is
TRUE (the default), then the
scales will be further adjusted as described above in the documentation of
autoscale argument in the Arguments section.
lasso(df, location, scale)
Each of these functions also takes an argument
The Laplace distribution is also known as the double-exponential distribution. It is a symmetric distribution with a sharp peak at its mean / median / mode and fairly long tails. This distribution can be motivated as a scale mixture of normal distributions and the remarks above about the normal distribution apply here as well.
The lasso approach to supervised learning can be expressed as finding the
posterior mode when the likelihood is Gaussian and the priors on the
coefficients have independent Laplace distributions. It is commonplace in
supervised learning to choose the tuning parameter by cross-validation,
whereas a more Bayesian approach would be to place a prior on “it”,
or rather its reciprocal in our case (i.e. smaller values correspond
to more shrinkage toward the prior location vector). We use a chi-square
prior with degrees of freedom equal to that specified in the call to
lasso or, by default, 1. The expectation of a chi-square random
variable is equal to this degrees of freedom and the mode is equal to the
degrees of freedom minus 2, if this difference is positive.
It is also common in supervised learning to standardize the predictors
before training the model. We do not recommend doing so. Instead, it is
better to specify
autoscale = TRUE (the default value), which
will adjust the scales of the priors according to the dispersion in the
variables. See the documentation of the
autoscale argument above
and also the
prior_summary page for more information.
product_normal(df, location, scale)
The product-normal distribution is the product of at least two independent
normal variates each with mean zero, shifted by the
parameter. It can be shown that the density of a product-normal variate is
symmetric and infinite at
location, so this prior resembles a
“spike-and-slab” prior for sufficiently large values of the
scale parameter. For better or for worse, this prior may be
appropriate when it is strongly believed (by someone) that a regression
coefficient “is” equal to the
location, parameter even though
no true Bayesian would specify such a prior.
Each element of
df must be an integer of at least \(2\) because
these “degrees of freedom” are interpreted as the number of normal
variates being multiplied and then shifted by
location to yield the
regression coefficient. Higher degrees of freedom produce a sharper
Each element of
scale must be a non-negative real number that is
interpreted as the standard deviation of the normal variates being
multiplied and then shifted by
location to yield the regression
coefficient. In other words, the elements of
scale may differ, but
the k-th standard deviation is presumed to hold for all the normal deviates
that are multiplied together and shifted by the k-th element of
location to yield the k-th regression coefficient. The elements of
scale are not the prior standard deviations of the regression
coefficients. The prior variance of the regression coefficients is equal to
the scale raised to the power of \(2\) times the corresponding element of
df. Thus, larger values of
scale put more prior volume on
values of the regression coefficient that are far from zero.
Gelman, A., Jakulin, A., Pittau, M. G., and Su, Y. (2008). A weakly informative default prior distribution for logistic and other regression models. Annals of Applied Statistics. 2(4), 1360--1383.
# Can assign priors to names N05 <- normal(0, 5)
The various vignettes for the rstanarm and rstap packages also discuss and demonstrate the use of the supported prior distributions.