Title: | Targeted Maximum Likelihood Estimation for Two-Stage Study Design |
---|---|
Description: | An inverse probability of censoring weighted (IPCW) targeted maximum likelihood estimator (TMLE) for evaluating a marginal point treatment effect from data where some variables were collected on only a subset of participants using a two-stage design (or marginal mean outcome for a single arm study). A TMLE for conditional parameters defined by a marginal structural model (MSM) is also available. |
Authors: | Susan Gruber [aut, cre], Mark van der Laan [aut] |
Maintainer: | Susan Gruber <[email protected]> |
License: | GPL-3 |
Version: | 1.0 |
Built: | 2024-11-06 05:11:51 UTC |
Source: | https://github.com/cran/twoStageDesignTMLE |
.evalAugW calls TMLE to use super learner to evalute preliminary predictions for Q(0,W) and Q(1,W) conditioning on stage 1 covariates
evalAugW(Y, A, W, Delta, id, family, SL.library)
evalAugW(Y, A, W, Delta, id, family, SL.library)
Y |
outcome vector |
A |
binary treatment indicator |
W |
covariate matrix |
Delta |
outcome missingness indicator |
id |
identifier of i.i.d. unit |
family |
outcome regression family |
SL.library |
super learner library for outcome regression modeling |
W.Q
, nx2 matrix of outcome predictions based on stage 1
covariates
print.summary.twoStageTMLE
## S3 method for class 'summary.twoStageTMLE' print(x, ...)
## S3 method for class 'summary.twoStageTMLE' print(x, ...)
x |
an object of class summary.twoStageTMLE |
... |
additional arguments (i) |
print object
print.twoStageTMLE
## S3 method for class 'twoStageTMLE' print(x, ...)
## S3 method for class 'twoStageTMLE' print(x, ...)
x |
an object of class twoStageTMLE |
... |
additional arguments (i) |
print tmle results using print.tmle method from tmle package
Utilities setV Set the number of cross-validation folds as a function of effective sample size See Phillips 2023 doi.org/10.1093/ije/dyad023
setV(n.effective)
setV(n.effective)
n.effective |
the effective sample size |
the number of cross-validation folds
Summarizes estimation procedure for missing 2nd stage covariates
## S3 method for class 'twoStage' summary(object, ...)
## S3 method for class 'twoStage' summary(object, ...)
object |
An object of class |
... |
Other arguments passed to the tmle function in the tmle package |
A list containing the missingness model, terms, coefficients, type,
summary.twoStageTMLE
## S3 method for class 'twoStageTMLE' summary(object, ...)
## S3 method for class 'twoStageTMLE' summary(object, ...)
object |
an object of class twoStageTMLE |
... |
additional arguments (ignored) |
list summarizing the two-stage procedure components, summary of the twoStage missingness estimation summary of the tmle for estimating the parameter
twoStageDesignTMLENews Get news about recent updates and bug fixes
twoStageDesignTMLENews(...)
twoStageDesignTMLENews(...)
... |
ignored |
invisible character string giving the path to the file found.
Inverse probability of censoring weighted TMLE for evaluating parameters when the full set of covariates is available on only a subset of observations.
twoStageTMLE( Y, A, W, Delta.W, W.stage2, Z = NULL, Delta = rep(1, length(Y)), pi = NULL, piform = NULL, pi.SL.library = c("SL.glm", "SL.gam", "SL.glmnet", "tmle.SL.dbarts.k.5"), V.pi = 10, pi.discreteSL = TRUE, condSetNames = c("A", "W", "Y"), id = NULL, Q.family = "gaussian", augmentW = TRUE, augW.SL.library = c("SL.glm", "SL.glmnet", "tmle.SL.dbarts2"), rareOutcome = FALSE, verbose = FALSE, ... )
twoStageTMLE( Y, A, W, Delta.W, W.stage2, Z = NULL, Delta = rep(1, length(Y)), pi = NULL, piform = NULL, pi.SL.library = c("SL.glm", "SL.gam", "SL.glmnet", "tmle.SL.dbarts.k.5"), V.pi = 10, pi.discreteSL = TRUE, condSetNames = c("A", "W", "Y"), id = NULL, Q.family = "gaussian", augmentW = TRUE, augW.SL.library = c("SL.glm", "SL.glmnet", "tmle.SL.dbarts2"), rareOutcome = FALSE, verbose = FALSE, ... )
Y |
outcome |
A |
binary treatment indicator |
W |
covariate matrix observed on everyone |
Delta.W |
binary indicator of missing second stage covariates |
W.stage2 |
matrix of second stage covariates observed on subset of observations |
Z |
optional mediator of treatment effect for evaluating a controlled direct effect |
Delta |
binary indicator of missing value for outcome |
pi |
optional vector of missingness probabilities for |
piform |
parametric regression formula for estimating |
pi.SL.library |
super learner library for estimating |
V.pi |
number of cross validation folds for estimating |
pi.discreteSL |
Use discrete super learning when |
condSetNames |
Variables to include as predictors of missingness
in |
id |
Identifier of independent units of observation, e.g., clusters |
Q.family |
Regression family for the outcome |
augmentW |
When |
augW.SL.library |
super learner library for preliminary outcome
regression model (ignored when |
rareOutcome |
When |
verbose |
When |
... |
other parameters passed to the tmle function (not checked) |
object of class 'twoStageTMLE'.
tmle |
Treatment effect estimates and summary information |
twoStage |
IPCW weight estimation summary, |
augW |
Matrix of predicted outcomes based on stage 1 covariates only |
tmle::tmle()
for details on customizing the estimation procedure
twoStageTMLEmsm()
for estimating conditional effects
S Rose and MJ van der Laan. A Targeted Maximum Likelihood Estimator for Two-Stage Designs. Int J Biostat. 2011 Jan 1; 7(1): 17. doi:10.2202/1557-4679.1217
n <- 1000 W1 <- rnorm(n) W2 <- rnorm(n) W3 <- rnorm(n) A <- rbinom(n, 1, plogis(-1 + .2*W1 + .3*W2 + .1*W3)) Y <- 10 + A + W1 + W2 + A*W1 + W3 + rnorm(n) d <- data.frame(Y, A, W1, W2, W3) # Set 400 with data on W3, more likely if W1 > 1 n.sample <- 400 p.sample <- 0.5 + .2*(W1 > 1) rows.sample <- sample(1:n, size = n.sample, p = p.sample) Delta.W <- rep(0,n) Delta.W[rows.sample] <- 1 W3.stage2 <- cbind(W3 = W3[Delta.W==1]) #1. specify parametric models and do not augment W (fast, but not recommended) result1 <- twoStageTMLE(Y=Y, A=A, W=cbind(W1, W2), Delta.W = Delta.W, W.stage2 = W3.stage2, piform = "Delta.W~ I(W1 > 0)", V.pi= 5,verbose = TRUE, Qform = "Y~A+W1",gform="A~W1 + W2 +W3", augmentW = FALSE) summary(result1) #2. specify a parametric model for conditional missingness probabilities (pi) # and use default values to estimate marginal effect using \code{tmle} result2 <- twoStageTMLE(Y=Y, A=A, W=cbind(W1, W2), Delta.W = Delta.W, W.stage2 = cbind(W3)[Delta.W == 1], piform = "Delta.W~ I(W1 > 0)", V.pi= 5,verbose = TRUE) result2
n <- 1000 W1 <- rnorm(n) W2 <- rnorm(n) W3 <- rnorm(n) A <- rbinom(n, 1, plogis(-1 + .2*W1 + .3*W2 + .1*W3)) Y <- 10 + A + W1 + W2 + A*W1 + W3 + rnorm(n) d <- data.frame(Y, A, W1, W2, W3) # Set 400 with data on W3, more likely if W1 > 1 n.sample <- 400 p.sample <- 0.5 + .2*(W1 > 1) rows.sample <- sample(1:n, size = n.sample, p = p.sample) Delta.W <- rep(0,n) Delta.W[rows.sample] <- 1 W3.stage2 <- cbind(W3 = W3[Delta.W==1]) #1. specify parametric models and do not augment W (fast, but not recommended) result1 <- twoStageTMLE(Y=Y, A=A, W=cbind(W1, W2), Delta.W = Delta.W, W.stage2 = W3.stage2, piform = "Delta.W~ I(W1 > 0)", V.pi= 5,verbose = TRUE, Qform = "Y~A+W1",gform="A~W1 + W2 +W3", augmentW = FALSE) summary(result1) #2. specify a parametric model for conditional missingness probabilities (pi) # and use default values to estimate marginal effect using \code{tmle} result2 <- twoStageTMLE(Y=Y, A=A, W=cbind(W1, W2), Delta.W = Delta.W, W.stage2 = cbind(W3)[Delta.W == 1], piform = "Delta.W~ I(W1 > 0)", V.pi= 5,verbose = TRUE) result2
Inverse probability of censoring weighted TMLE for evaluating MSM parameters when the full set of covariates is available on only a subset of observations, as in a 2-stage design.
twoStageTMLEmsm( Y, A, W, V, Delta.W, W.stage2, Delta = rep(1, length(Y)), pi = NULL, piform = NULL, pi.SL.library = c("SL.glm", "SL.gam", "SL.glmnet", "tmle.SL.dbarts.k.5"), V.pi = 10, pi.discreteSL = TRUE, condSetNames = c("A", "V", "W", "Y"), id = NULL, Q.family = "gaussian", augmentW = TRUE, augW.SL.library = c("SL.glm", "SL.glmnet", "tmle.SL.dbarts2"), rareOutcome = FALSE, verbose = FALSE, ... )
twoStageTMLEmsm( Y, A, W, V, Delta.W, W.stage2, Delta = rep(1, length(Y)), pi = NULL, piform = NULL, pi.SL.library = c("SL.glm", "SL.gam", "SL.glmnet", "tmle.SL.dbarts.k.5"), V.pi = 10, pi.discreteSL = TRUE, condSetNames = c("A", "V", "W", "Y"), id = NULL, Q.family = "gaussian", augmentW = TRUE, augW.SL.library = c("SL.glm", "SL.glmnet", "tmle.SL.dbarts2"), rareOutcome = FALSE, verbose = FALSE, ... )
Y |
outcome of interest (missingness allowed) |
A |
binary treatment indicator |
W |
matrix or data.frame of covariates measured on entire population |
V |
vector, matrix, or dataframe of covariates used to define MSM strata |
Delta.W |
Indicator of inclusion in subset with additional information |
W.stage2 |
matrix or data.frame of covariates measured in subset population |
Delta |
binary indicator that outcome Y is observed |
pi |
optional vector of sampling probabilities |
piform |
optional parametric regression model for estimating pi |
pi.SL.library |
optional SL library specification for estimating pi (ignored when piform or pi is provided) |
V.pi |
optional number of cross-validation folds for super learning (ignored when piform or pi is provided) |
pi.discreteSL |
flag to indicate whether to use ensemble or discrete super learning (ignored when piform or pi is provided) |
condSetNames |
variables to condition on when estimating pi. Default is
covariates in |
id |
optional indicator of independent units of observation |
Q.family |
outcome regression family, "gaussian" or "binomial" |
augmentW |
set to |
augW.SL.library |
super learner library for preliminary outcome
regression model (ignored when |
rareOutcome |
when |
verbose |
when |
... |
other arugments passed to the |
Object of class "twoStageTMLE"
Treatment effect estimates and summary information from
call to tmleMSM
function
IPCW weight estimation summary,
pi
are the probabilities,coef
are SL weights or coefficients
from glm fit, type
of estimation procedure,
discreteSL
flag indicating whether discrete super learning was used
Matrix of predicted outcomes based on stage 1 covariates only
tmle::tmleMSM()
for details on customizing the estimation procedure
twoStageTMLE()
for estimating marginal effects
n <- 1000 set.seed(10) W1 <- rnorm(n) W2 <- rnorm(n) W3 <- rnorm(n) A <- rbinom(n, 1, plogis(-1 + .2*W1 + .3*W2 + .1*W3)) Y <- 10 + A + W1 + W2 + A*W1 + W3 + rnorm(n) Y.bin <- rbinom(n, 1, plogis(-4.6 - 1.8* A + W1 + W2 -.3 *A*W1 + W3)) # Set 400 obs with data on W3, more likely if W1 > 1 n.sample <- 400 p.sample <- 0.5 + .2*(W1 > 1) rows.sample <- sample(1:n, size = n.sample, p = p.sample) Delta.W <- rep(0,n) Delta.W[rows.sample] <- 1 W3.stage2 <- cbind(W3 = W3[Delta.W==1]) # 1. specify parametric models, misspecified outcome model (not recommended) result1.MSM <- twoStageTMLEmsm(Y=Y, A=A, V= cbind(W1), W=cbind(W2), Delta.W = Delta.W, W.stage2 = W3.stage2, augmentW = FALSE, piform = "Delta.W~ I(W1 > 0)", MSM = "A*W1", augW.SL.library = "SL.glm", Qform = "Y~A+W1",gform="A~W1 + W2 +W3", hAVform = "A~1", verbose=TRUE) summary(result1.MSM) # 2. Call again, passing in previously estimated observation weights, # note that specifying a correct model for Q improves efficiency result2.MSM <- twoStageTMLEmsm(Y=Y, A=A, V= cbind(W1), W=cbind(W2), Delta.W = Delta.W, W.stage2 = W3.stage2, augmentW = FALSE, pi = result1.MSM$twoStage$pi, MSM = "A*W1", Qform = "Y~ A + W1 + W2 + A*W1 + W3",gform="A~W1 + W2 +W3", hAVform = "A~1") cbind(SE.Qmis = result1.MSM$tmle$se, SE.Qcor = result2.MSM$tmle$se) #Binary outcome, augmentW, rareOutcome result3.MSM <- twoStageTMLEmsm(Y=Y.bin, A=A, V= cbind(W1), W=cbind(W2), Delta.W = Delta.W, W.stage2 = W3.stage2, augmentW = TRUE, piform = "Delta.W~ I(W1 > 0)", MSM = "A*W1", gform="A~W1 + W2 +W3", Q.family = "binomial", rareOutcome=TRUE)
n <- 1000 set.seed(10) W1 <- rnorm(n) W2 <- rnorm(n) W3 <- rnorm(n) A <- rbinom(n, 1, plogis(-1 + .2*W1 + .3*W2 + .1*W3)) Y <- 10 + A + W1 + W2 + A*W1 + W3 + rnorm(n) Y.bin <- rbinom(n, 1, plogis(-4.6 - 1.8* A + W1 + W2 -.3 *A*W1 + W3)) # Set 400 obs with data on W3, more likely if W1 > 1 n.sample <- 400 p.sample <- 0.5 + .2*(W1 > 1) rows.sample <- sample(1:n, size = n.sample, p = p.sample) Delta.W <- rep(0,n) Delta.W[rows.sample] <- 1 W3.stage2 <- cbind(W3 = W3[Delta.W==1]) # 1. specify parametric models, misspecified outcome model (not recommended) result1.MSM <- twoStageTMLEmsm(Y=Y, A=A, V= cbind(W1), W=cbind(W2), Delta.W = Delta.W, W.stage2 = W3.stage2, augmentW = FALSE, piform = "Delta.W~ I(W1 > 0)", MSM = "A*W1", augW.SL.library = "SL.glm", Qform = "Y~A+W1",gform="A~W1 + W2 +W3", hAVform = "A~1", verbose=TRUE) summary(result1.MSM) # 2. Call again, passing in previously estimated observation weights, # note that specifying a correct model for Q improves efficiency result2.MSM <- twoStageTMLEmsm(Y=Y, A=A, V= cbind(W1), W=cbind(W2), Delta.W = Delta.W, W.stage2 = W3.stage2, augmentW = FALSE, pi = result1.MSM$twoStage$pi, MSM = "A*W1", Qform = "Y~ A + W1 + W2 + A*W1 + W3",gform="A~W1 + W2 +W3", hAVform = "A~1") cbind(SE.Qmis = result1.MSM$tmle$se, SE.Qcor = result2.MSM$tmle$se) #Binary outcome, augmentW, rareOutcome result3.MSM <- twoStageTMLEmsm(Y=Y.bin, A=A, V= cbind(W1), W=cbind(W2), Delta.W = Delta.W, W.stage2 = W3.stage2, augmentW = TRUE, piform = "Delta.W~ I(W1 > 0)", MSM = "A*W1", gform="A~W1 + W2 +W3", Q.family = "binomial", rareOutcome=TRUE)