Bayesian generalized spatial-temporal aggregated predictor(STAP) models via Stan

Generalized linear modeling with spatial temporal aggregated predictors using prior distributions for the coefficients, intercept, spatial-temporal scales, and auxiliary parameters.

stap_glm(formula, family = gaussian(), subject_data = NULL,
  distance_data = NULL, time_data = NULL, subject_ID = NULL,
  max_distance = NULL, max_time = NULL, weights, offset = NULL,
  model = TRUE, y = TRUE, contrasts = NULL, ..., prior = normal(),
  prior_intercept = normal(), prior_stap = normal(),
  prior_theta = log_normal(location = 1L, scale = 1L),
  prior_aux = exponential(), adapt_delta = NULL)

stap_lm(formula, subject_data = NULL, distance_data = NULL,
  time_data = NULL, subject_ID = NULL, max_distance = NULL,
  max_time = NULL, weights, offset = NULL, model = TRUE, y = TRUE,
  contrasts = NULL, ..., prior = normal(),
  prior_intercept = normal(), prior_stap = normal(),
  prior_theta = log_normal(location = 1L, scale = 1L),
  prior_aux = exponential(), adapt_delta = NULL)

Arguments

formula

Same as for glm. Note that in-formula transformations will not be passed ot the final design matrix. Covariates that have "_scale" or "_shape" in their name are not advised as this text is parsed for in the final model fit.

family

Same as glm for gaussian, binomial, and poisson families.

subject_data

a data.frame that contains data specific to the subject or subjects on whom the outcome is measured. Must contain one column that has the subject_ID on which to join the distance and time_data

distance_data

a (minimum) three column data.frame that contains (1) an id_key (2) The sap/tap/stap features and (3) the distances between subject with a given id and the built environment feature in column (2), the distance column must be the only column of type "double" and the sap/tap/stap features must be specified in the dataframe exactly as they are in the formula.

time_data

same as distance_data except with time that the subject has been exposed to the built environment feature, instead of distance

subject_ID

name of column(s) to join on between subject_data and bef_data

max_distance

the inclusion distance; upper bound for all elements of dists_crs

max_time

inclusion time; upper bound for all elements of times_crs

offset, weights

Same as glm.

model

logical denoting whether or not to return the fixed covariates model frame object in the fitted object

In stap_glm, logical scalar indicating whether to return the response vector. In stan_glm.fit, a response vector.

contrasts

Same as glm, but rarely specified.

...

Further arguments passed to the function in the rstap to specify iter, chains, cores, refresh, etc.

prior

The prior distribution for the regression coefficients. prior should be a call to one of the various functions provided by rstap for specifying priors. The subset of these functions that can be used for the prior on the coefficients can be grouped into several "families":

Family	Functions
Student t family	`normal`, `student_t`, `cauchy`
Hierarchical shrinkage family	`hs`, `hs_plus`
Laplace family	`laplace`, `lasso`
Product normal family	`product_normal`

See the priors help page for details on the families and how to specify the arguments for all of the functions in the table above. To omit a prior ---i.e., to use a flat (improper) uniform prior--- prior can be set to NULL, although this is rarely a good idea.

Note: If prior is from the Student t family or Laplace family, and if the autoscale argument to the function used to specify the prior (e.g. normal) is left at its default and recommended value of TRUE, then the default or user-specified prior scale(s) may be adjusted internally based on the scales of the predictors. See the priors help page and the Prior Distributions vignette for details on the rescaling and the prior_summary function for a summary of the priors used for a particular model.

prior_intercept

The prior distribution for the intercept. prior_intercept can be a call to normal, student_t or cauchy. See the priors help page for details on these functions. To omit a prior on the intercept ---i.e., to use a flat (improper) uniform prior--- prior_intercept can be set to NULL.

Note: The prior distribution for the intercept is set so it applies to the value when all predictors are centered. If you prefer to specify a prior on the intercept without the predictors being auto-centered, then you have to omit the intercept from the formula and include a column of ones as a predictor, in which case some element of prior specifies the prior on it, rather than prior_intercept. Regardless of how prior_intercept is specified, the reported estimates of the intercept always correspond to a parameterization without centered predictors (i.e., same as in glm).

prior_stap

prior for spatial-temporal aggregated predictors. Note that prior is set on the standardized latent covariates.

prior_theta

prior for the spatial-temporal aggregated predictors' scale and shape if the weibull weight function is selected. Can either be a single prior or a prior nested within a list of lists for separate BEFs/space-time components.

prior_aux

The prior distribution for the "auxiliary" parameter (if applicable). The "auxiliary" parameter refers to a different parameter depending on the family. For Gaussian models prior_aux controls "sigma", the error standard deviation. For negative binomial models prior_aux controls "reciprocal_dispersion", which is similar to the "size" parameter of rnbinom: smaller values of "reciprocal_dispersion" correspond to greater dispersion. For gamma models prior_aux sets the prior on to the "shape" parameter (see e.g., rgamma), and for inverse-Gaussian models it is the so-called "lambda" parameter (which is essentially the reciprocal of a scale parameter). Binomial and Poisson models do not have auxiliary parameters.

prior_aux can be a call to exponential to use an exponential distribution, or normal, student_t or cauchy, which results in a half-normal, half-t, or half-Cauchy prior. See priors for details on these functions. To omit a prior ---i.e., to use a flat (improper) uniform prior--- set prior_aux to NULL.

adapt_delta

See the adapt_delta help page for details.

Value

A stapreg object is returned for stap_glm.

A stapfit object (or a slightly modified stapfit object) is returned if stan_glm.fit is called directly.

Details

The stap_glm function is similar in syntax to stan_glm except instead of performing full bayesian inference for a generalized linear model stap_glm incorporates spatial-temporal covariates

References

Gelman, A. and Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge, UK.

Muth, C., Oravecz, Z., and Gabry, J. (2018) User-friendly Bayesian regression modeling: A tutorial with rstanarm and shinystan. The Quantitative Methods for Psychology. 14(2), 99--119. https://www.tqmp.org/RegularArticles/vol14-2/p099/p099.pdf

Examples


fit_glm <- stap_glm(formula = y ~ sex + sap(Fast_Food),
                   subject_data = homog_subject_data[1:100,], # for speed of example only
                     distance_data = homog_distance_data,
                     family = gaussian(link = 'identity'),
                     subject_ID = 'subj_id',
                     prior = normal(location = 0, scale = 5, autoscale = FALSE),
                     prior_intercept = normal(location = 25, scale = 5, autoscale = FALSE),
                     prior_stap = normal(location = 0, scale = 3, autoscale = FALSE),
                     prior_theta = log_normal(location = 1, scale = 1),
                     prior_aux = cauchy(location = 0,scale = 5),
                     max_distance = max(homog_distance_data$Distance),
                     chains = 1, iter = 300, # for speed of example only
                     refresh = -1, verbose = FALSE)
#> Chain 1: 
#> Chain 1: Gradient evaluation took 0.00039 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 3.9 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: 
#> Chain 1:  Elapsed Time: 0.811819 seconds (Warm-up)
#> Chain 1:                0.795173 seconds (Sampling)
#> Chain 1:                1.60699 seconds (Total)
#> Chain 1: 
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> http://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
#> Running the chains for more iterations may help. See
#> http://mc-stan.org/misc/warnings.html#tail-ess