R - cox风险模型不包括因子水平

时间:2014-01-26 18:12:43

标签: r statistics survival-analysis cox-regression

我将cox模型拟合到一些结构化的数据:

str(test)
'data.frame':   147 obs. of  8 variables:
 $ AGE              : int  71 69 90 78 61 74 78 78 81 45 ...
 $ Gender           : Factor w/ 2 levels "F","M": 2 1 2 1 2 1 2 1 2 1 ...
 $ RACE             : Factor w/ 5 levels "","BLACK","HISPANIC",..: 5 2 5 5 5 5 5 5 5 1 ...
 $ SIDE             : Factor w/ 2 levels "L","R": 1 1 2 1 2 1 1 1 2 1 ...
 $ LESION.INDICATION: Factor w/ 12 levels "CLAUDICATION",..: 1 11 4 11 9 1 1 11 11 11 ...
 $ RUTH.CLASS       : int  3 5 4 5 4 3 3 5 5 5 ...
 $ LESION.TYPE      : Factor w/ 3 levels "","OCCLUSION",..: 3 3 2 3 3 3 2 3 3 3 ...
 $ Primary          : int  1190 1032 166 689 219 840 1063 115 810 157 ...

RUTH.CLASS变量实际上是一个因素,我已将其更改为一个因素:

> test$RUTH.CLASS <- as.factor(test$RUTH.CLASS)
> summary(test$RUTH.CLASS)
 3  4  5  6 
48 56 35  8 

大。

拟合模型后

stent.surv <- Surv(test$Primary)
> cox.ruthclass <- coxph(stent.surv ~ RUTH.CLASS, data=test )
> 
> summary(cox.ruthclass)
Call:
coxph(formula = stent.surv ~ RUTH.CLASS, data = test)

  n= 147, number of events= 147 

              coef exp(coef) se(coef)     z Pr(>|z|)   
RUTH.CLASS4 0.1599    1.1734   0.1987 0.804  0.42111   
RUTH.CLASS5 0.5848    1.7947   0.2263 2.585  0.00974 **
RUTH.CLASS6 0.3624    1.4368   0.3846 0.942  0.34599   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

            exp(coef) exp(-coef) lower .95 upper .95
RUTH.CLASS4     1.173     0.8522    0.7948     1.732
RUTH.CLASS5     1.795     0.5572    1.1518     2.796
RUTH.CLASS6     1.437     0.6960    0.6762     3.053

Concordance= 0.574  (se = 0.026 )
Rsquare= 0.045   (max possible= 1 )
Likelihood ratio test= 6.71  on 3 df,   p=0.08156
Wald test            = 7.09  on 3 df,   p=0.06902
Score (logrank) test = 7.23  on 3 df,   p=0.06478

> levels(test$RUTH.CLASS)
[1] "3" "4" "5" "6"

当我在模型中拟合更多变量时,会发生类似的事情:

cox.fit <- coxph(stent.surv ~ RUTH.CLASS + LESION.INDICATION + LESION.TYPE, data=test )
> 
> summary(cox.fit)
Call:
coxph(formula = stent.surv ~ RUTH.CLASS + LESION.INDICATION + 
    LESION.TYPE, data = test)

  n= 147, number of events= 147 

                                          coef exp(coef) se(coef)      z Pr(>|z|)  
RUTH.CLASS4                            -0.5854    0.5569   1.1852 -0.494   0.6214  
RUTH.CLASS5                            -0.1476    0.8627   1.0182 -0.145   0.8847  
RUTH.CLASS6                            -0.4509    0.6370   1.0998 -0.410   0.6818  
LESION.INDICATIONEMBOLIC               -0.4611    0.6306   1.5425 -0.299   0.7650  
LESION.INDICATIONISCHEMIA               1.3794    3.9725   1.1541  1.195   0.2320  
LESION.INDICATIONISCHEMIA/CLAUDICATION  0.2546    1.2899   1.0189  0.250   0.8027  
LESION.INDICATIONREST PAIN              0.5302    1.6993   1.1853  0.447   0.6547  
LESION.INDICATIONTISSUE LOSS            0.7793    2.1800   1.0254  0.760   0.4473  
LESION.TYPEOCCLUSION                   -0.5886    0.5551   0.4360 -1.350   0.1770  
LESION.TYPESTEN                        -0.7895    0.4541   0.4378 -1.803   0.0714 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

                                       exp(coef) exp(-coef) lower .95 upper .95
RUTH.CLASS4                               0.5569     1.7956   0.05456     5.684
RUTH.CLASS5                               0.8627     1.1591   0.11726     6.348
RUTH.CLASS6                               0.6370     1.5698   0.07379     5.499
LESION.INDICATIONEMBOLIC                  0.6306     1.5858   0.03067    12.964
LESION.INDICATIONISCHEMIA                 3.9725     0.2517   0.41374    38.141
LESION.INDICATIONISCHEMIA/CLAUDICATION    1.2899     0.7752   0.17510     9.503
LESION.INDICATIONREST PAIN                1.6993     0.5885   0.16645    17.347
LESION.INDICATIONTISSUE LOSS              2.1800     0.4587   0.29216    16.266
LESION.TYPEOCCLUSION                      0.5551     1.8015   0.23619     1.305
LESION.TYPESTEN                           0.4541     2.2023   0.19250     1.071

Concordance= 0.619  (se = 0.028 )
Rsquare= 0.137   (max possible= 1 )
Likelihood ratio test= 21.6  on 10 df,   p=0.01726
Wald test            = 22.23  on 10 df,   p=0.01398
Score (logrank) test = 23.46  on 10 df,   p=0.009161

> levels(test$LESION.INDICATION)
[1] "CLAUDICATION"          "EMBOLIC"               "ISCHEMIA"              "ISCHEMIA/CLAUDICATION"
[5] "REST PAIN"             "TISSUE LOSS"          
> levels(test$LESION.TYPE)
[1] ""          "OCCLUSION" "STEN" 

以下model.matrix的截断输出:

> model.matrix(cox.fit)
    RUTH.CLASS4 RUTH.CLASS5 RUTH.CLASS6 LESION.INDICATIONEMBOLIC LESION.INDICATIONISCHEMIA
1             0           0           0                        0                         0
2             0           1           0                        0                         0

我们可以看到,每个人的第一级被排除在模型之外。任何投入将不胜感激。我注意到LESION.TYPE变量上没有包含空白级别"",但这不是设计 - 应该是NA或类似的东西。

我很困惑,可以使用一些帮助。感谢。

1 个答案:

答案 0 :(得分:3)

任何模型中的因子都会根据基准水平(对比度)返回系数。您的contrasts默认为基本因子。计算下降值的系数没有意义,因为假设所有其他因子值为0(因子是完整的并且对于每个观察是互斥的),模型将在下降值= 1时返回预测。您可以通过更改contrasts中的options来更改默认对比度。

您的系数与所有因子的平均值相比:

options(contrasts=c(unordered="contr.sum", ordered="contr.poly"))

对于你的系数与特定治疗(你有什么以及你的默认值):

options(contrasts=c(unordered="contr.treatment", ordered="contr.poly"))

正如您所看到的,R中存在两种类型的因素:无序(或分类,例如红色,绿色,蓝色)和有序(例如,非常不同意,不同意,没有意见,同意,非常同意)