在给定两个协变量的情况下,我在操作延迟的启发式数据集上运行考克斯比例风险模型:
> dat.op
delay censor cost demand
1 2.875000 1 3.10 0.1
2 1.569444 1 0.68 0.1
3 2.000000 1 6.05 0.2
4 1.750000 1 5.22 0.1
5 2.000000 1 4.67 0.3
6 3.000000 1 9.30 1.4
在coxph
下,两个协变量如预期的那样,对危险率具有负面影响,系数为-0.0813
和-2.5490
。换句话说(现在暂时忽略了较高的p值),成本和需求都会导致操作延迟的增加:
> coxph(Surv(delay, censor) ~ cost + demand, data=dat.op)
Call:
coxph(formula = Surv(delay, censor) ~ cost + demand, data = dat.op)
coef exp(coef) se(coef) z p
cost -0.0813 0.9219 0.3909 -0.21 0.84
demand -2.5490 0.0782 3.6635 -0.70 0.49
Likelihood ratio test=3.35 on 2 df, p=0.187
n= 6, number of events= 6
但是,当我通过flexsurvreg
运行数据时,为了同时获得潜在危害分布的参数估计值(假设常用的Weibull),我会观察到不同的效果:
> flexsurvreg(Surv(delay, censor) ~ cost + demand, data=dat.op, dist="weibull")
Call:
flexsurvreg(formula = Surv(delay, censor) ~ cost + demand, data = dat.op,
dist = "weibull")
Estimates:
data mean est L95% U95% se exp(est) L95% U95%
shape NA 5.2163 2.7040 10.0627 1.7487 NA NA NA
scale NA 2.5223 1.3762 4.6226 0.7796 NA NA NA
cost 4.8367 -0.0427 -0.2119 0.1264 0.0863 0.9582 0.8091 1.1348
demand 0.3667 0.3943 -0.4005 1.1891 0.4055 1.4834 0.6700 3.2842
N = 6, Events: 6, Censored: 0
Total time at risk: 13.19444
Log-likelihood = -3.914008, df = 4
AIC = 15.82802
在这里,需求的系数为0.3943
,表明它减少了的运行延迟,这是荒谬的。
切换到Gompertz发行版后,我现在看到 cost 减少了操作延迟:
> flexsurvreg(Surv(delay, censor) ~ cost + demand, data=dat.op, dist="gompertz")
Call:
flexsurvreg(formula = Surv(delay, censor) ~ cost + demand, data = dat.op,
dist = "gompertz")
Estimates:
data mean est L95% U95% se exp(est) L95% U95%
shape NA 2.27e+00 7.05e-01 3.83e+00 7.97e-01 NA NA NA
rate NA 6.10e-03 2.36e-05 1.58e+00 1.73e-02 NA NA NA
cost 4.84e+00 2.56e-01 -6.92e-01 1.21e+00 4.84e-01 1.29e+00 5.00e-01 3.34e+00
demand 3.67e-01 -2.26e+00 -6.98e+00 2.46e+00 2.41e+00 1.04e-01 9.27e-04 1.17e+01
N = 6, Events: 6, Censored: 0
Total time at risk: 13.19444
Log-likelihood = -4.208994, df = 4
AIC = 16.41799
我误解了这些flexsurvreg
结果吗?是否有一种方法可以从一组输出中获得与coxph
的估计更一致的Weibull参数估计?