priors.Rd
The functions described on this page are used to specify the prior-related #' arguments of the various modeling functions in the rstap package (to
view the priors used for an existing model see prior_summary
).
The default priors used in the various rstap modeling functions are
intended to be weakly informative in that they provide moderate
regularlization and help stabilize computation. For many applications the
defaults will perform well, but prudent use of more informative priors is
encouraged. All of the priors here are informed by the priors in rstanarm, though it should be noted that the heirarchical shape and lkj priors are not included.
normal(location = 0, scale = NULL, autoscale = TRUE) student_t(df = 1, location = 0, scale = NULL, autoscale = TRUE) cauchy(location = 0, scale = NULL, autoscale = TRUE) laplace(location = 0, scale = NULL, autoscale = TRUE) lasso(df = 1, location = 0, scale = NULL, autoscale = TRUE) product_normal(df = 2, location = 0, scale = 1) exponential(rate = 1, autoscale = TRUE) log_normal(location = 0, scale = 1) decov(regularization = 1, concentration = 1, shape = 1, scale = 1)
location | Prior location. In most cases, this is the prior mean, but
for |
---|---|
scale | Prior scale. The default depends on the family (see Details). |
autoscale | A logical scalar, defaulting to |
df | Prior degrees of freedom. The default is \(1\) for
|
rate | Prior rate for the exponential distribution. Defaults to
|
regularization | Exponent for an LKJ prior on the correlation matrix in
the |
concentration | Concentration parameter for a symmetric Dirichlet distribution. The default is \(1\), implying a joint uniform prior. |
shape | Shape parameter for a gamma prior on the scale parameter in the
|
A named list to be used internally by the rstap model fitting functions.
The details depend on the family of the prior being used:
Family members:
normal(location, scale)
student_t(df, location, scale)
cauchy(location, scale)
Each of these functions also takes an argument autoscale
which is relevant
if used for any of the non-stap related parameters. It is not used otherwise.
For the prior distribution for the intercept, location
,
scale
, and df
should be scalars. As the
degrees of freedom approaches infinity, the Student t distribution
approaches the normal distribution and if the degrees of freedom are one,
then the Student t distribution is the Cauchy distribution.
If scale
is not specified it will default to \(10\) for the
intercept and \(2.5\) for the other coefficients, unless the probit link
function is used, in which case these defaults are scaled by a factor of
dnorm(0)/dlogis(0)
, which is roughly \(1.6\).
If the autoscale
argument is TRUE
(the default), then the
scales will be further adjusted as described above in the documentation of
the autoscale
argument in the Arguments section.
Family members:
laplace(location, scale)
lasso(df, location, scale)
Each of these functions also takes an argument autoscale
.
The Laplace distribution is also known as the double-exponential distribution. It is a symmetric distribution with a sharp peak at its mean / median / mode and fairly long tails. This distribution can be motivated as a scale mixture of normal distributions and the remarks above about the normal distribution apply here as well.
The lasso approach to supervised learning can be expressed as finding the
posterior mode when the likelihood is Gaussian and the priors on the
coefficients have independent Laplace distributions. It is commonplace in
supervised learning to choose the tuning parameter by cross-validation,
whereas a more Bayesian approach would be to place a prior on “it”,
or rather its reciprocal in our case (i.e. smaller values correspond
to more shrinkage toward the prior location vector). We use a chi-square
prior with degrees of freedom equal to that specified in the call to
lasso
or, by default, 1. The expectation of a chi-square random
variable is equal to this degrees of freedom and the mode is equal to the
degrees of freedom minus 2, if this difference is positive.
It is also common in supervised learning to standardize the predictors
before training the model. We do not recommend doing so. Instead, it is
better to specify autoscale = TRUE
(the default value), which
will adjust the scales of the priors according to the dispersion in the
variables. See the documentation of the autoscale
argument above
and also the prior_summary
page for more information.
Family members:
product_normal(df, location, scale)
The product-normal distribution is the product of at least two independent
normal variates each with mean zero, shifted by the location
parameter. It can be shown that the density of a product-normal variate is
symmetric and infinite at location
, so this prior resembles a
“spike-and-slab” prior for sufficiently large values of the
scale
parameter. For better or for worse, this prior may be
appropriate when it is strongly believed (by someone) that a regression
coefficient “is” equal to the location
, parameter even though
no true Bayesian would specify such a prior.
Each element of df
must be an integer of at least \(2\) because
these “degrees of freedom” are interpreted as the number of normal
variates being multiplied and then shifted by location
to yield the
regression coefficient. Higher degrees of freedom produce a sharper
spike at location
.
Each element of scale
must be a non-negative real number that is
interpreted as the standard deviation of the normal variates being
multiplied and then shifted by location
to yield the regression
coefficient. In other words, the elements of scale
may differ, but
the k-th standard deviation is presumed to hold for all the normal deviates
that are multiplied together and shifted by the k-th element of
location
to yield the k-th regression coefficient. The elements of
scale
are not the prior standard deviations of the regression
coefficients. The prior variance of the regression coefficients is equal to
the scale raised to the power of \(2\) times the corresponding element of
df
. Thus, larger values of scale
put more prior volume on
values of the regression coefficient that are far from zero.
Gelman, A., Jakulin, A., Pittau, M. G., and Su, Y. (2008). A weakly informative default prior distribution for logistic and other regression models. Annals of Applied Statistics. 2(4), 1360--1383.
# Can assign priors to names N05 <- normal(0, 5)
The various vignettes for the rstanarm and rstap packages also discuss and demonstrate the use of the supported prior distributions.
#'