Spatial-temporal aggregated predictor (STAP) models extension in the longitudinal setting incorporates heirarchical variance components to account for within subject or group-level correlation.

This setting also involves a substantially different data structure, since exposure to built environment features now occurs across time and subjects can move across time or space. Additionally since subjects may drop out of a study and built-environment features - stores - may close or open, the staps are neccessarily different at each measurement. The inclusion of these repeated measures, requires a secondary id to join on in addition to the typical id_key for the distance_data and time_data arguments to the stap_glmer function. We’ll demonstrate this and the stap model with random effects using the simple case of homogenously distributed built environment features, with subjects measured across two time points.

library(dplyr)
library(tidyr)
library(ggplot2)
library(rstap)
library(plot3D)

Measurement Data

Below we can see what the data, when initially measured, might look like before preparing it for model fitting.

knitr::kable(head(subj_data))

x	y	date	ID	class
0.7253479	-0.7452062	1974-05-11	1	Subject
-0.6051202	-0.2405906	1976-03-04	2	Subject
0.1696653	0.5845662	1976-04-15	3	Subject
-0.3834797	-0.9895510	1986-12-24	4	Subject
0.2920221	0.7254065	1987-07-17	5	Subject
0.8772785	-0.7254489	1974-01-31	6	Subject

knitr::kable(head(bef_data))

x	y	date_open	date_close	ID	class
0.7851891	0.3844256	1967-07-31	NA	1	Coffee_Shop
-0.0703001	-0.7953893	1961-11-26	NA	2	Coffee_Shop
0.0178176	-0.6283173	1968-02-10	NA	3	Coffee_Shop
-0.5009678	-0.6053075	1968-11-16	NA	4	Coffee_Shop
0.5744385	0.5685900	1969-07-13	NA	5	Coffee_Shop
0.7722542	-0.7553629	1962-07-05	NA	6	Coffee_Shop

The third and final dataset involves the subject specific data and the dates at which the measurements were taken.

subj_fdata %>% arrange(ID) %>% head() %>% knitr::kable()

ID	sex	DOB	Income	measure_date	Age	measure_ID
1	0	1969-04-19	55.82780	2000-07-30	31.30137	1
1	0	1969-04-19	164.21804	2003-11-08	34.57808	2
2	1	1953-05-04	73.82412	2000-04-11	46.96986	1
2	1	1953-05-04	168.65774	2002-04-16	48.98356	2
3	1	1965-08-01	36.23697	2000-12-05	35.36986	1
3	1	1965-08-01	131.82313	2002-12-04	37.36712	2

Transforming the Measured Data

In order to join these datasets, a new ID has to be created to associate the spatial-temporal data with a specific measurement date. We create this ID to include all spatial data up to and including the measurement date for each subject.

Note that for subjects one and two there are two rows, corresponding to the two measurements. In contrast, subject nine has four rows, corresponding to the fact that subject 9’s move before the two measurements results in both locations of 9’s residence being associated with all measurements subsequently measured.

subj_data %>% left_join(subj_fdata,by="ID") %>% 
    filter(ID %in% c(1,2,9)) %>% 
    select(ID,measure_ID,x,y,date,measure_date) %>% knitr::kable()

ID	measure_ID	x	y	date	measure_date
1	1	0.7253479	-0.7452062	1974-05-11	2000-07-30
1	2	0.7253479	-0.7452062	1974-05-11	2003-11-08
2	1	-0.6051202	-0.2405906	1976-03-04	2000-04-11
2	2	-0.6051202	-0.2405906	1976-03-04	2002-04-16
9	1	-0.7904145	-0.8712954	1985-09-21	2000-10-24
9	2	-0.7904145	-0.8712954	1985-09-21	2001-01-04

Stores Opening and Closing, Subjects Moving in the Data

Here we can see the effect of a store closing (BEF_ID 10) and a subject moving (subj_ID 9) on the data structure. Note that first row’s time is ~25.8 years which reflects the time spent since the subject first moved to their location. Since neither the subject nor the business moves by the subject’s second measurement in July of 2001, we see that the distances betwen subject 1 and Coffee Shop 1 stay constant, and the time increases according to the difference in measurement date, only.

In contrast, we see the effect of a subject moving for subj_ID 9. Here (s)he has twice as many measurements to show her/his exposure to the same coffee shop at the different locations where subj 9 lived. This is shown most evidently in the constant exposure for for coffee shops at the 1985 location for subject 9. Since subject 9 moved after ~6 years, and coffee shop 1 hadn’t closed at that time, subject 9 only has 6 years of exposure to that coffee shop at that distance. Following this we can see a steadily increasing exposure until the coffee shop closes in 2002, roughly a year before the subject’s second measurement in ’03.

td_data  %>% 
    filter(subj_ID %in% c(1,9),bef_ID%in%c(1,10)) %>% 
    arrange(subj_ID,measure_ID,bef_ID) %>% 
    knitr::kable()

subj_ID	measure_ID	bef_ID	measure_date	date_open	date_close	date	class	dist	time
1	1	1	2000-07-30	1967-07-31	NA	1974-05-11	Coffee_Shop	1.131216	26.23836 days
1	1	10	2000-07-30	1963-09-19	NA	1974-05-11	Coffee_Shop	1.961466	26.23836 days
1	2	1	2003-11-08	1967-07-31	NA	1974-05-11	Coffee_Shop	1.131216	29.51507 days
1	2	10	2003-11-08	1963-09-19	NA	1974-05-11	Coffee_Shop	1.961466	29.51507 days
9	1	1	2000-10-24	1967-07-31	NA	1985-09-21	Coffee_Shop	2.014786	15.10137 days
9	1	10	2000-10-24	1963-09-19	NA	1985-09-21	Coffee_Shop	1.823751	15.10137 days
9	2	1	2001-01-04	1967-07-31	NA	1985-09-21	Coffee_Shop	2.014786	15.29863 days
9	2	10	2001-01-04	1963-09-19	NA	1985-09-21	Coffee_Shop	1.823751	15.29863 days

Simulating and Estimating a STAP model

If we were to assume the following model from the previous data set-up

\[ E[Y_{ij}|b_i] = \alpha + Z_1\delta_{Income} + Z_{2}\delta_{sex} + Z_{3}\delta_{Age} + X(\theta_s,\theta_t)\beta_{Coffee} + b_i \] Where \[ X(\theta_s,\theta_t) = \sum_{d\in\mathcal{D}} w_d(\frac{d}{\theta_s}) w_t(\frac{t}{\theta_t}) \]

and simulate it under the following fixed values

alpha <- 22
delta <- c(Income = -.35,sex = 1.1, Age = 1.2)
beta <- 1
theta_s <- .8
theta_t <- 18
sigma <- 2
tau <- 1.5
d <- seq(from = 0, to = max(td_data$dist), by = 0.01)
t <- seq(from = 0, to = as.numeric(max(td_data$time)), by = 0.05)
w_s <- pracma::erfc(d/theta_s)
w_t <- pracma::erf(t/theta_t)

We could see the individual spatial and temporal exposure functions

par(mfrow=c(1,2))
plot(d,w_s,type='l', main="Spatial Decay",xlab = "Distance", ylab ="Exposure")
plot(t,w_t,type = "l", main = "Temporal Accumulation",xlab='years', ylab = "Exposure")

As well as the joint spatial-temporal exposure surface:

This results in the following exposure density across the population. Note that the exposure density value increases between the two measurements. This reflects the increased impact as a result of spending more time in the same space. With real data this may not be the case, given that individuals

subj_fdata <- subj_fdata %>% left_join(cbind(lme_inf$fr,y,y_bern),by=c("ID","measure_ID"))  %>% 
    mutate(sex = sex.x, Coffee_Shop= Coffee_Shop.x,subj_ID=ID,
           centered_income = `I((Income - mean(Income))/sd(Income))`,
           centered_age = `I((Age - mean(Age))/sd(Age))`) %>%
    select(subj_ID,DOB,Coffee_Shop,sex,
           centered_income,centered_age,y,y_bern,
           ran_int,measure_ID,Age,Income)

The final datasets passed to the stap_glmer function will then be the subject specific data, and then the distance and/or time dataset corresponding to whether or not a spatial or temporal scale is estimated.

subj_ID	Income	measure_ID	Age	ran_int	y	sex	Coffee_Shop
1	55.82780	1	31.30137	0.2581927	27.79630	0	8.470124
2	73.82412	1	46.96986	-0.6825780	33.81810	1	8.489090
3	36.23697	1	35.36986	0.4801513	33.05382	1	8.991242
4	28.86876	1	36.47671	2.8948296	32.27050	1	3.582213
5	47.23422	1	39.40822	0.9040974	29.30665	0	6.115291
6	78.05369	1	32.98082	0.0097133	26.87130	1	6.953036

subj_ID	measure_ID	date	bef_ID	class	dist
1	1	1974-05-11	1	Coffee_Shop	1.1312156
2	1	1976-03-04	1	Coffee_Shop	1.5243376
3	1	1976-04-15	1	Coffee_Shop	0.6472447
4	1	1986-12-24	1	Coffee_Shop	1.8037734
5	1	1987-07-17	1	Coffee_Shop	0.5995679
6	1	1974-01-31	1	Coffee_Shop	1.1136884

subj_ID	measure_ID	date	bef_ID	class	time
1	1	1974-05-11	1	Coffee_Shop	26.23836 days
2	1	1976-03-04	1	Coffee_Shop	24.12055 days
3	1	1976-04-15	1	Coffee_Shop	24.65753 days
4	1	1986-12-24	1	Coffee_Shop	13.54795 days
5	1	1987-07-17	1	Coffee_Shop	13.05479 days
6	1	1974-01-31	1	Coffee_Shop	26.65753 days

We’ll fit the data with priors that reflect our relative uncertainty about the spatial and temporal scales and place a fairly uninformative prior on the subject-specific variance. For information on how priors are set on the subject specific covariance, see the rstanarm vignette’s write up here.

fit <- stap_lmer(y ~ centered_income +  sex + centered_age + stap(Coffee_Shop) + (1|subj_ID),
                  subject_data = homog_longitudinal_subject_data, ## names of datasets 
                  distance_data = homog_longitudinal_distance_data,##  simulated above
                  time_data = homog_longitudinal_time_data, ## in the rstap package
                  subject_ID = 'subj_ID', 
                  group_ID = 'measure_ID',
                  prior_intercept = normal(location = 25, scale = 4, autoscale = F),
                  prior = normal(location = 0, scale = 4,autoscale=F),
                  prior_stap = normal(location = 0, scale = 4),
                  prior_theta = list(Coffee_Shop = list(spatial = log_normal(location = 1,
                                                                             scale = 1),
                                                        temporal = log_normal(location = 1, 
                                                                              scale = 1))),
                  prior_covariance = decov(regularization = 1,concentration = 1,
                                          shape = 1,scale = 1),
                  prior_aux = cauchy(0,5),
                  max_distance = 3, 
                  max_time = 50,
                  chains = 4,
                  iter = 2E3,
                  cores = 4)

we can now look at our estimates.

We see that even though the temporal scale was quite a fair bit away from the bulk of the prior, the log normal tail and sufficiently data heavy likelihood allowed for a tight posterior interval around the temporal scale. Other parameter estimates were all accurate as well.

STAP in a Longitudinal Data Setting

Measurement Data

Transforming the Measured Data

Stores Opening and Closing, Subjects Moving in the Data

Simulating and Estimating a STAP model

Contents