Title: | Fit Additive Hazards Models for Survival Analysis |
---|---|
Description: | Contains tools to fit the additive hazards model to a cohort study, a case-cohort study, or a general two-phase sampling study. It allows weight calibration to incorporate additional auxiliary information from the full cohort into the estimation procedure in a case-cohort study or from the phase I sample in a two-phase sampling study. It can be used for both Bernoulli and finite population sampling. This package provides regression parameter estimates and their model-based and robust standard errors. It also offers tools to make prediction of individual specific hazards at a specified time. |
Authors: | Jie (Kate) Hu [aut, cre] |
Maintainer: | Jie (Kate) Hu <[email protected]> |
License: | GPL-2 | file LICENSE |
Version: | 1.2.0 |
Built: | 2025-02-18 04:10:22 UTC |
Source: | https://github.com/katehu/addhazard |
Fit a semiparametric additive hazard model '
The estimating procedures follow Lin & Ying (1994).
ah(formula, data, robust, weights, ties, seed = 20, ...)
ah(formula, data, robust, weights, ties, seed = 20, ...)
formula |
a formula object for the regression model of the form
response ~ predictors. The outcome is a survival object created by
|
data |
a data frame. Input dataset. |
robust |
a logical variable. Robust standard errors are provided if robust == TRUE. |
weights |
a numeric vector. The weight of each observation. |
ties |
a string. If there are ties in the survival time, when ties = 'break' a small random number is added to the survival time to break the ties. |
seed |
an integer. Seed number used to generate random increment when breaking ties. The default number is 20. |
... |
additional arguments to be passed to the low level regression fitting functions. |
An object of class 'ah' representing the fit.
The response variable is a survival object. The regression
model can be univariate or multivariate. This function is built upon the function
ahaz
by Anders Gorst-Rasmussen.
Lin, D.Y. & Ying, Z. (1994). Semiparametric analysis of the additive risk model. Biometrika; 81:61-71.
predict.ah
for prediction based on fitted
ah
model, nwtsco
for the description of nwtsco dataset
library(survival) ### using the first 100 rows in nwtsco to build an additive hazards model nwts<- nwtsco[1:100,] ### fit the additive hazards model to the data ### the model-based standard errors are reported when setting robust = FALSE fit1 <- ah(Surv(trel,relaps) ~ age + instit, data = nwts, robust = FALSE) summary(fit1) ### fit the additive hazards model to the data with robust standard errors fit2 <- ah(Surv(trel,relaps) ~ age + instit, data = nwts, robust = TRUE) summary(fit2) ### when there are ties, break the ties by setting ties = 'break' nwts_all <- nwtsco fit3 <- ah(Surv(trel,relaps) ~ age + instit, ties = 'break', data = nwts_all, robust = TRUE) summary(fit3) ### users could break the ties on their own by nwts_all$trel <- nwtsco$trel + runif(dim(nwts_all)[1],0,1)*1e-10 fit3 <- ah(Surv(trel,relaps) ~ age + instit, data = nwts_all, robust = TRUE) summary(fit3)
library(survival) ### using the first 100 rows in nwtsco to build an additive hazards model nwts<- nwtsco[1:100,] ### fit the additive hazards model to the data ### the model-based standard errors are reported when setting robust = FALSE fit1 <- ah(Surv(trel,relaps) ~ age + instit, data = nwts, robust = FALSE) summary(fit1) ### fit the additive hazards model to the data with robust standard errors fit2 <- ah(Surv(trel,relaps) ~ age + instit, data = nwts, robust = TRUE) summary(fit2) ### when there are ties, break the ties by setting ties = 'break' nwts_all <- nwtsco fit3 <- ah(Surv(trel,relaps) ~ age + instit, ties = 'break', data = nwts_all, robust = TRUE) summary(fit3) ### users could break the ties on their own by nwts_all$trel <- nwtsco$trel + runif(dim(nwts_all)[1],0,1)*1e-10 fit3 <- ah(Surv(trel,relaps) ~ age + instit, data = nwts_all, robust = TRUE) summary(fit3)
The function fits a semiparametric additive hazards model
to two-phase sampling data. The estimating procedures follow Hu (2014).
ah.2ph( formula, data, R, Pi = NULL, weights = NULL, ties, robust = FALSE, calibration.variables = NULL, seed = 20, ... )
ah.2ph( formula, data, R, Pi = NULL, weights = NULL, ties, robust = FALSE, calibration.variables = NULL, seed = 20, ... )
formula |
a formula object for the regression model of the form
response ~ predictors. The outcome is a survival object created by |
data |
a data frame. Input dataset. |
R |
a phase II membership indicator. A vector of values of 0 and 1. The subject is selected to phase II if R = 1. |
Pi |
the probability of a subject to be selected to the phase II subsample. |
weights |
weight assigned to each individual, inverse of the selection probability |
ties |
a string. If there are ties in the survival time, when ties = 'break' a small random number is added to the survival time to break the ties. |
robust |
a logical variable. Robust standard errors are provided if robust = TRUE. |
calibration.variables |
a vector of strings of some column names of the data. These are the variables available for every observation. They are used to calibrate the weight assigned to each subject |
seed |
an integer. Seed number used to generate random increment when breaking ties. The default number is 20. |
... |
additional arguments to be passed to the low level regression fitting functions. |
An object of class 'ah.2h' representing the fit.
This function estimates both model-based and robust standard errors. It can be used to analyze case-cohort studies with subsampling among cases. It allows weight calibration with auxiliary information from the full cohort (phase I sample). By this means, more information is used and thus weight calibration potentially could further improve the precision of prediction or our estimation on the regression coefficients.
Jie Hu (2014) A Z-estimation System for Two-phase Sampling with Applications to Additive Hazards Models and Epidemiologic Studies. Dissertation, University of Washington.
predict.ah.2ph
for prediction based on fitted additive
hazards model with two-phase sampling and nwtsco
for the description
of nwtsco dataset.
library(survival) ### fit an additive hazards model to two-phase sampling data without calibration fit1 <- ah.2ph(Surv(trel,relaps) ~ age + histol, data = nwts2ph, R = in.ph2, Pi = Pi, robust = FALSE, ties = 'break') summary(fit1) ### use weight instead of the selection probability Pi in the input fit1 <- ah.2ph(Surv(trel,relaps) ~ age + histol, data = nwts2ph, R = in.ph2, weights = wts, robust = FALSE, ties = 'break') summary(fit1) ### fit an additve hazards model with calibration on age fit2 <- ah.2ph(Surv(trel,relaps) ~ age + histol, data = nwts2ph, R = in.ph2, Pi = Pi, robust = FALSE, ties = 'break', calibration.variables = 'age') summary(fit2) ### calibrate on age square ### note if users create a calibration variable, then ### the new variable should be added to the original data frame nwts2ph$age2 <- nwts2ph$age^2 fit3 <- ah.2ph(Surv(trel,relaps) ~ age + histol, data = nwts2ph, R = in.ph2, Pi = Pi, robust = FALSE, ties = 'break', calibratio.variables = 'age2') summary(fit3) ############################################################################# ## When phase II samples are obtained by finite Sampling ############################################################################# ### calculating the sample size for each straum ### calculate the strata size strt.size <- table(nwts2ph$strt) ph2.strt.size <- table(subset(nwts2ph, in.ph2 == 1)$strt) ### fit an additve hazards model with finite stratified sampling ### calculate the sampling fractions frac <- ph2.strt.size/strt.size ### treating the problem as bernoulli sampling coupled with calibration on strata sizes ### using frac as the sampling probilities nwts2ph_by_FPSS <- nwts2ph nwts2ph_by_FPSS$Pi <- NULL for (i in 1:length(strt.size)){ nwts2ph_by_FPSS$Pi[nwts2ph_by_FPSS$strt ==i] <- frac[i] } ### create strt indicators, which become our calibration variables for (i in 1:length(strt.size)){ nwts2ph_by_FPSS$strt_ind <- as.numeric(nwts2ph_by_FPSS$strt ==i) names(nwts2ph_by_FPSS)[ncol(nwts2ph_by_FPSS)]= paste0('strt', i) } ### fit an additve hazards model with finate sampling fit4 <- ah.2ph(Surv(trel,relaps) ~ age + histol, data = nwts2ph_by_FPSS, R = in.ph2, Pi = Pi, robust = FALSE, ties = 'break', calibration.variables = c('strt1','strt2','strt3')) summary(fit4)
library(survival) ### fit an additive hazards model to two-phase sampling data without calibration fit1 <- ah.2ph(Surv(trel,relaps) ~ age + histol, data = nwts2ph, R = in.ph2, Pi = Pi, robust = FALSE, ties = 'break') summary(fit1) ### use weight instead of the selection probability Pi in the input fit1 <- ah.2ph(Surv(trel,relaps) ~ age + histol, data = nwts2ph, R = in.ph2, weights = wts, robust = FALSE, ties = 'break') summary(fit1) ### fit an additve hazards model with calibration on age fit2 <- ah.2ph(Surv(trel,relaps) ~ age + histol, data = nwts2ph, R = in.ph2, Pi = Pi, robust = FALSE, ties = 'break', calibration.variables = 'age') summary(fit2) ### calibrate on age square ### note if users create a calibration variable, then ### the new variable should be added to the original data frame nwts2ph$age2 <- nwts2ph$age^2 fit3 <- ah.2ph(Surv(trel,relaps) ~ age + histol, data = nwts2ph, R = in.ph2, Pi = Pi, robust = FALSE, ties = 'break', calibratio.variables = 'age2') summary(fit3) ############################################################################# ## When phase II samples are obtained by finite Sampling ############################################################################# ### calculating the sample size for each straum ### calculate the strata size strt.size <- table(nwts2ph$strt) ph2.strt.size <- table(subset(nwts2ph, in.ph2 == 1)$strt) ### fit an additve hazards model with finite stratified sampling ### calculate the sampling fractions frac <- ph2.strt.size/strt.size ### treating the problem as bernoulli sampling coupled with calibration on strata sizes ### using frac as the sampling probilities nwts2ph_by_FPSS <- nwts2ph nwts2ph_by_FPSS$Pi <- NULL for (i in 1:length(strt.size)){ nwts2ph_by_FPSS$Pi[nwts2ph_by_FPSS$strt ==i] <- frac[i] } ### create strt indicators, which become our calibration variables for (i in 1:length(strt.size)){ nwts2ph_by_FPSS$strt_ind <- as.numeric(nwts2ph_by_FPSS$strt ==i) names(nwts2ph_by_FPSS)[ncol(nwts2ph_by_FPSS)]= paste0('strt', i) } ### fit an additve hazards model with finate sampling fit4 <- ah.2ph(Surv(trel,relaps) ~ age + histol, data = nwts2ph_by_FPSS, R = in.ph2, Pi = Pi, robust = FALSE, ties = 'break', calibration.variables = c('strt1','strt2','strt3')) summary(fit4)
An hypothetical two-phase sampling dataset based on nwtsco dataset from the National Wilms Tumor Study (NWTS)
nwts2ph
nwts2ph
A data frame with 3915 rows and 15 variables:
We create a hypothetical two-phase sampling (stratified sampling) dataset. In this dataset, only the subjects who have relapses and some of the controls have their samples sent to the central laboratory for more accurate histology examination.
Institutional histology is examined in the local hospital. It is usually available for all the samples. Central histology is more expensive to obtain since the samples have to be sent to the central laboratory and the work requires experienced lab scientists. It is reasonable to assume only some samples were tested for central histology. Following the two-phase sampling design, we selected subjects for central histology measurements based on their outcomes and institutional histology results.
Time to relapse or last date seen (yr), continuous
Time to death or last date seen (yr), continuous
Indicator of relapse, 0 = Alive no prior relapse when last seen, 1 = Relapsed after trel years
Indicator of death, 0 = Alive when last seen, 1 = Died after tsur years
NWTS study, 3 = NWTS-3, 4 = NWTS-4
Stage of disease, 1=I, 2=II, 3=III, 4=IV
Central Path histology, 0 = Favorable (FH) and the subject is selected into the phase II subsample (in.ph2 = 1), 1 = Unfavorable (UH) and the subject is selected into the phase II subsample (in.ph2 = 1), NA = subject is NOT selected into the phase II subsample (in.ph2 = 1)
Institutional histology, 0 = Favorable (FH), 1 = Unfavorable (UH)
Age at diagnosis (yr), continuous
Year of diagnosis, calendar year
Weight of tumor bearing specimen, in grams (continuous)
Diameter of tumor, in centimeters (continuous)
Strata, 1 = Unfavorable Institutional histology and no relapse, 2 = favorable Institutional histology and no relapse, 3 = relapse
Sampling probability, 0.5 for strata =1, 0.9 for strata = 2, 0.9 for strata = 3
weight assigned to each individual, inverse of the selection probability
Phase II membership, 1 = selected into the phase II subsample, 0 = not selected into the phase II subsample
This dataset was created based on nwtsco dataset from the National Wilms Tumor Study (NWTS)
This file genreate the example dataset nwts2ph importFrom('stats', 'rbinom')
nwts2ph.generate(data, seed = 20)
nwts2ph.generate(data, seed = 20)
data |
the full cohort data |
seed |
the random seed we use for generating this dataset |
Dataset from the National Wilms Tumor Study (NWTS)
nwtsco
nwtsco
A data frame with 3915 rows and 12 variables:
Time to relapse orlast date seen (yr), continuous
Time to death or last date seen (yr), continuous
Indicator of relapse, 0 = Alive no prior relapse when last seen, 1 = Relapsed after trel years
Indicator of death, 0 = Alive when last seen, 1 = Died after tsur years
NWTS study, 3 = NWTS-3, 4 = NWTS-4
Stage of disease, 1=I, 2=II, 3=III, 4=IV
Central Path histology, 0 = Favorable (FH), 1 = Unfavorable (UH)
Institutional histology, 0 = Favorable (FH), 1 = Unfavorable (UH)
Age at diagnosis (yr), continuous
Year of diagnosis, calendar year
Weight of tumor bearing specimen, in grams (continuous)
Diameter of tumor, in centimeters (continuous)
Originally used by M. Kulich and D.Y. Lin: Improving the efficiency of relative-risk estimation in case-cohort studies. J Amer Statis Assoc 99:832-844, 2004.
This function predicts a subject's overall hazard rates at given time points
based on this subject's covariate values. The prediction function is an additive
hazards model fitted using ah
.
## S3 method for class 'ah' predict(object, newdata, newtime, ...)
## S3 method for class 'ah' predict(object, newdata, newtime, ...)
object |
an object of class inhering from |
newdata |
a dataframe of an individual's predictors. |
newtime |
a given sequence of time points at which the prediction is performed.
The time should be on the same scale as the survival time in |
... |
further arguments passed to or from other methods. |
A dataframe including the time points for prediction, predicted values and their standard errors.
ah
for fitting the additive hazards model, nwtsco
for
the description of nwtsco dataset
library(survival) ### fit the additive hazards model to the data nwts<- nwtsco[1:100,] fit <- ah(Surv(trel,relaps) ~ age + instit, data = nwts, ties = 'break', robust = FALSE) ### see the covariate names in the prediction function fit$call ### the newdata should be a dataframe ### the variable names are the same as the covariate names in ### the prediction function newdata <- data.frame(age=60, instit =1) ### an alternative way to give the newdata newdata <- nwtsco[101,] ### based on this subject's covariate values, the function predicts individual specific ### hazard rates at time points 3 and 5 predict(fit, newdata, newtime = c(3,5))
library(survival) ### fit the additive hazards model to the data nwts<- nwtsco[1:100,] fit <- ah(Surv(trel,relaps) ~ age + instit, data = nwts, ties = 'break', robust = FALSE) ### see the covariate names in the prediction function fit$call ### the newdata should be a dataframe ### the variable names are the same as the covariate names in ### the prediction function newdata <- data.frame(age=60, instit =1) ### an alternative way to give the newdata newdata <- nwtsco[101,] ### based on this subject's covariate values, the function predicts individual specific ### hazard rates at time points 3 and 5 predict(fit, newdata, newtime = c(3,5))
This function predicts a subject's overall hazard rates at given time points
based on this subject's covariate values. The prediction function is an object
from ah.2ph
. The estimating procedures follow Hu (2014).
## S3 method for class 'ah.2ph' predict(object, newdata, newtime, ...)
## S3 method for class 'ah.2ph' predict(object, newdata, newtime, ...)
object |
an object of class inhering from 'ah.2ph'. |
newdata |
a dataframe of an individual's predictors. |
newtime |
a given sequence of time points at which the prediction is performed. |
... |
further arguments passed to or from other methods. |
A dataframe including the given time points, predicted hazards, their standard errors, their variances, the phase I component of the variance for predicted hazards and the phase II component of the variance.
Jie Hu (2014) A Z-estimation System for Two-phase Sampling with Applications to Additive Hazards Models and Epidemiologic Studies. Dissertation, University of Washington.
ah.2ph
for fitting the additive hazards model with two-phase
library(survival) ### fit an additive hazards model to two-phase sampling data without calibration fit1 <- ah.2ph(Surv(trel,relaps) ~ age + histol, data = nwts2ph, R = in.ph2, Pi = Pi, robust = FALSE, ties = 'break') ### input the new data for prediction newdata <- nwtsco[101,] ### based on the fitted model fit1, perform prediction at time points t =3 and t= 5 predict(fit1, newdata, newtime = c(3,5)) ### fit an additve hazards model to two-phase sampling data with calibration ### The calibration variable is instit fit2 <- ah.2ph(Surv(trel,relaps) ~ age + histol, data = nwts2ph, R = in.ph2, Pi = Pi, ties = 'break', robust = FALSE, calibration.variables = "instit") ### based on the fitted model fit2, perform prediction at time points t =3 and t= 5 predict(fit2, newdata, newtime = c(3,5))
library(survival) ### fit an additive hazards model to two-phase sampling data without calibration fit1 <- ah.2ph(Surv(trel,relaps) ~ age + histol, data = nwts2ph, R = in.ph2, Pi = Pi, robust = FALSE, ties = 'break') ### input the new data for prediction newdata <- nwtsco[101,] ### based on the fitted model fit1, perform prediction at time points t =3 and t= 5 predict(fit1, newdata, newtime = c(3,5)) ### fit an additve hazards model to two-phase sampling data with calibration ### The calibration variable is instit fit2 <- ah.2ph(Surv(trel,relaps) ~ age + histol, data = nwts2ph, R = in.ph2, Pi = Pi, ties = 'break', robust = FALSE, calibration.variables = "instit") ### based on the fitted model fit2, perform prediction at time points t =3 and t= 5 predict(fit2, newdata, newtime = c(3,5))