我正在寻找使用树状算法对左截断的右删失数据进行生存分析的方法。我尝试了包ipred
和pec
,但函数ipredbagg
和pecCforest
似乎只能在没有左截断的情况下工作。
数据描述
我的数据看起来很像斯坦福心脏移植数据中的心脏数据集。实际上,对象在t = 0时存在风险,但是一些对象(对我来说绝大多数)仅在稍后的时间t1进入调查,所以当它们在t
心脏数据集看起来像这样
Surv(heart$start, heart$stop, heart$event)
[1] ( 0.0, 50.0] ( 0.0, 6.0] ( 0.0, 1.0+] ( 1.0, 16.0] ( 0.0, 36.0+] ( 36.0, 39.0]
[7] ( 0.0, 18.0] ( 0.0, 3.0] ( 0.0, 51.0+] ( 51.0, 675.0] ( 0.0, 40.0] ( 0.0, 85.0]
[13] ( 0.0, 12.0+] ( 12.0, 58.0] ( 0.0, 26.0+] ( 26.0, 153.0] ( 0.0, 8.0] ( 0.0, 17.0+]
[19] ( 17.0, 81.0] ( 0.0, 37.0+] ( 37.0,1387.0] ( 0.0, 1.0] ( 0.0, 28.0+] ( 28.0, 308.0]
[25] ( 0.0, 36.0] ( 0.0, 20.0+] ( 20.0, 43.0] ( 0.0, 37.0] ( 0.0, 18.0+] ( 18.0, 28.0]
[31] ( 0.0, 8.0+] ( 8.0,1032.0] ( 0.0, 12.0+] ( 12.0, 51.0] ( 0.0, 3.0+] ( 3.0, 733.0]
[37] ( 0.0, 83.0+] ( 83.0, 219.0] ( 0.0, 25.0+] ( 25.0,1800.0+] ( 0.0,1401.0+] ( 0.0, 263.0]
[43] ( 0.0, 71.0+] ( 71.0, 72.0] ( 0.0, 35.0] ( 0.0, 16.0+] ( 16.0, 852.0] ( 0.0, 16.0]
[49] ( 0.0, 17.0+] ( 17.0, 77.0] ( 0.0, 51.0+] ( 51.0,1587.0+] ( 0.0, 23.0+] ( 23.0,1572.0+]
[55] ( 0.0, 12.0] ( 0.0, 46.0+] ( 46.0, 100.0] ( 0.0, 19.0+] ( 19.0, 66.0] ( 0.0, 4.5+]
[61] ( 4.5, 5.0] ( 0.0, 2.0+] ( 2.0, 53.0] ( 0.0, 41.0+] ( 41.0,1408.0+] ( 0.0, 58.0+]
[67] ( 58.0,1322.0+] ( 0.0, 3.0] ( 0.0, 2.0] ( 0.0, 40.0] ( 0.0, 1.0+] ( 1.0, 45.0]
[73] ( 0.0, 2.0+] ( 2.0, 996.0] ( 0.0, 21.0+] ( 21.0, 72.0] ( 0.0, 9.0] ( 0.0, 36.0+]
[79] ( 36.0,1142.0+] ( 0.0, 83.0+] ( 83.0, 980.0] ( 0.0, 32.0+] ( 32.0, 285.0] ( 0.0, 102.0]
[85] ( 0.0, 41.0+] ( 41.0, 188.0] ( 0.0, 3.0] ( 0.0, 10.0+] ( 10.0, 61.0] ( 0.0, 67.0+]
[91] ( 67.0, 942.0+] ( 0.0, 149.0] ( 0.0, 21.0+] ( 21.0, 343.0] ( 0.0, 78.0+] ( 78.0, 916.0+]
[97] ( 0.0, 3.0+] ( 3.0, 68.0] ( 0.0, 2.0] ( 0.0, 69.0] ( 0.0, 27.0+] ( 27.0, 842.0+]
[103] ( 0.0, 33.0+] ( 33.0, 584.0] ( 0.0, 12.0+] ( 12.0, 78.0] ( 0.0, 32.0] ( 0.0, 57.0+]
[109] ( 57.0, 285.0] ( 0.0, 3.0+] ( 3.0, 68.0] ( 0.0, 10.0+] ( 10.0, 670.0+] ( 0.0, 5.0+]
[115] ( 5.0, 30.0] ( 0.0, 31.0+] ( 31.0, 620.0+] ( 0.0, 4.0+] ( 4.0, 596.0+] ( 0.0, 27.0+]
[121] ( 27.0, 90.0] ( 0.0, 5.0+] ( 5.0, 17.0] ( 0.0, 2.0] ( 0.0, 46.0+] ( 46.0, 545.0+]
[127] ( 0.0, 21.0] ( 0.0, 210.0+] (210.0, 515.0+] ( 0.0, 67.0+] ( 67.0, 96.0] ( 0.0, 26.0+]
[133] ( 26.0, 482.0+] ( 0.0, 6.0+] ( 6.0, 445.0+] ( 0.0, 428.0+] ( 0.0, 32.0+] ( 32.0, 80.0]
[139] ( 0.0, 37.0+] ( 37.0, 334.0] ( 0.0, 5.0] ( 0.0, 8.0+] ( 8.0, 397.0+] ( 0.0, 60.0+]
[145] ( 60.0, 110.0] ( 0.0, 31.0+] ( 31.0, 370.0+] ( 0.0, 139.0+] (139.0, 207.0] ( 0.0, 160.0+]
[151] (160.0, 186.0] ( 0.0, 340.0] ( 0.0, 310.0+] (310.0, 340.0+] ( 0.0, 28.0+] ( 28.0, 265.0+]
[157] ( 0.0, 4.0+] ( 4.0, 165.0] ( 0.0, 2.0+] ( 2.0, 16.0] ( 0.0, 13.0+] ( 13.0, 180.0+]
[163] ( 0.0, 21.0+] ( 21.0, 131.0+] ( 0.0, 96.0+] ( 96.0, 109.0+] ( 0.0, 21.0] ( 0.0, 38.0+]
[169] ( 38.0, 39.0+] ( 0.0, 31.0+] ( 0.0, 11.0+] ( 0.0, 6.0]
因此,在每个间隔的第一次,对象进入我的设置并开始“处于危险之中”。在第二次对象离开集合时,有些因为有趣的事件发生(没有'+')而其他事件被审查(带'+')。
Cox回归
对于Cox回归,一切正常。上面创建的Surv对象可用于执行Cox回归。
coxtime=coxph(Surv(heart$start, heart$stop, heart$event)~1,data=heart)
summary(coxtime)
Call: coxph(formula = Surv(heart$start, heart$stop, heart$event) ~
1, data = heart)
Null model
log likelihood= -298.1214
n= 172
我还可以绘制生存函数
plot(survfit(coxtime),xscale=365.25, xlab = "Years", ylab="Survival")
Survival function of heart dataset 现在我想用树状算法执行相同的分析。
ipredbagg
当我尝试ippredbag
- 函数时,此函数在没有左截断的情况下正常工作:
library(survival)
library(ipred)
#without left truncation
ipredbagg(Surv(heart$stop, heart$event) ,X=heart$surgery)
我得到了结果
Bagging survival trees with 25 bootstrap replications
因为心脏集中有行,其中起始值为0,所以当我只是输入开始并停止到ipredbagg
函数时,我会收到错误。
#with left truncation
ipredbagg(Surv(heart$start, heart$stop, heart$event) ,X=heart$surgery)
Error in get(paste("rpart", method, sep = "."), envir = environment())(Y, :
Observation time must be > 0
因此我在开始和停止列中都添加了一个,但现在又出现了另一个错误。
#with left truncation and start > 0
ipredbagg(Surv(heart$start+1, heart$stop+1, heart$event) ,X=heart$surgery)
错误:
Error in table(index2, levels = 1:ngrp) :
all arguments must have the same length
pecCforest
我的第二次尝试是来自pecCforest
- 包的pec
- 函数。此函数也适用于没有左截断的数据。
库(PEC)
库(方)
库(存活)
#without left truncation
fitcforest <- pecCforest(Surv(stop, event) ~ .,data=heart[,-which(names(heart)=="start")],
controls = cforest_classical(ntree=100),mtry=2);
predictSurvProb(fitcforest,heart[1,],times=1)
我得到了结果
[,1]
[1,] 0.9890595
这次我可以在开始+停止列上训练模型而没有错误,但我无法预测它。
#with left truncation
fitcforest <- pecCforest(Surv((start), (stop), event) ~ .,data=heart,
controls = cforest_classical(ntree=100));
predictSurvProb(fitcforest,heart[1,],times=1)
这导致
Error in predict.survfit(object, newdata = newdata, times = times, bytimes = TRUE, :
Predictions only available
for class 'survfit', possibly stratified Kaplan-Meier fits.
For class 'cph' Cox models see survest.cph.
在两列中添加1启动和停止会导致相同的错误。
#with left truncation and start > 0
fitcforest <- pecCforest(Surv((start+1), (stop+1), event) ~ .,data=heart,
controls = cforest_classical(ntree=100,mtry=2));
predictSurvProb(fitcforest,heart[1,],times=1)
Error in predict.survfit(object, newdata = newdata, times = times, bytimes = TRUE, :
Predictions only available
for class 'survfit', possibly stratified Kaplan-Meier fits.
For class 'cph' Cox models see survest.cph.
有没有办法让这些函数适用于左截断数据?似乎左截断没有在两个函数中实现,但我无法找到有关它的信息。是否有另一种方法对R中的左截断数据进行幸存分析,并使用基于树的算法(我设法做了标准的Cox回归)?
答案 0 :(得分:-1)
您可以在CRAN上尝试最近的包 LTRCtrees ,这是专为左截断的生存数据而设计的