Question

下午好，

我可以发布可重现的代码，如果每个人都认为某些内容有问题，我当然会认为我的问题非常简单，有人会指出正确的道路。

我正在使用这样的数据集：

created_as_free_user     t     c
                 <fctr> <int> <int>
1                  true    36     0
2                  true    36     0
3                  true     0     1
4                  true    28     0
5                  true     9     0
6                  true     0     1
7                  true    13     0
8                  true    19     0
9                  true     9     0
10                 true    16     0

我安装了这样的Cox回归模型：

fit_train = coxph(Surv(time = t,event = c) ~ created_as_free_user ,data = teste)
summary(fit_train)

并收到：

Call:
coxph(formula = Surv(time = t, event = c) ~ created_as_free_user, 
    data = teste)

  n= 9000, number of events= 1233 

                            coef exp(coef) se(coef)      z Pr(>|z|)    
created_as_free_usertrue -0.7205    0.4865   0.1628 -4.426 9.59e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

                         exp(coef) exp(-coef) lower .95 upper .95
created_as_free_usertrue    0.4865      2.055    0.3536    0.6693

Concordance= 0.511  (se = 0.002 )
Rsquare= 0.002   (max possible= 0.908 )
Likelihood ratio test= 15.81  on 1 df,   p=7e-05
Wald test            = 19.59  on 1 df,   p=9.589e-06
Score (logrank) test = 20.45  on 1 df,   p=6.109e-06

到目前为止一切顺利。下一步：预测新数据的结果。我理解预测UCxph可以给我的不同类型的预测（或者至少我认为我这样做）。让我们使用type =＆＃34; lp＆＃34;：

head(predict(fit_train,validacao,type = "lp"),n=20)

得到：

     1           2           3           4           5           6           7           8           9          10 
-0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 
         11          12          13          14          15          16          17          18          19          20 
-0.01208854 -0.01208854  0.70842049 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854

行。但是，当我查看我想要估算的数据时：

# A tibble: 9,000 × 3
   created_as_free_user     t     c
                 <fctr> <int> <int>
1                  true    20     0
2                  true    12     0
3                  true     0     1
4                  true    10     0
5                  true    51     0
6                  true    36     0
7                  true    44     0
8                  true     0     1
9                  true    27     0
10                 true     6     0
# ... with 8,990 more rows

这让我感到困惑....

类型=＆＃34; lp＆＃34;是不是想给你线性预测？对于我正在尝试估计的上述数据，因为created_as_free_user变量等于true，我错误地期望type =＆＃34; lp＆＃34;预测准确到-0.7205（上述模型的系数）？ -0.01208854在哪里？我怀疑它是某种规模的情况，但无法在线找到答案。

我的最终目标是由预测类型=＆＃34;期望＆＃34;给出的h（t），但是我并不是很舒服使用它，因为它使用了这个-0.01208854的值，我没有＆＃39完全明白。

非常感谢

Answer 1

?predict.coxph中的详细信息部分显示：

Cox模型是相对风险模型;类型的预测 “线性预测器”，“风险”和“术语”都是相对于来自他们的样本。默认情况下，参考值为这些中的每一个都是分层内的平均协变量。

为了说明这意味着什么，我们可以看一个简单的例子。一些假数据：

test1 <- list(time=c(4,3,1,1,1), 
             status=c(1,1,1,0,0), 
             x=c(0,2,1,1,0))

我们适合模型并查看预测：

fit <- coxph(Surv(time, status) ~ x, test1) 
predict(fit, type = "lp")
# [1] -0.6976630  1.0464945  0.1744157  0.1744157 -0.6976630

预测与：

相同

(test1$x - mean(test1$x)) * coef(fit)
# [1] -0.6976630  1.0464945  0.1744157  0.1744157 -0.6976630

（使用这个逻辑和一些算术，我们可以从你的结果中退出，你的created_as_free_user变量的9000个观察中有8849个“真实”。）

Coxph预测与系数不匹配

1 个答案: