关于gee相关结构的corstr论证没有影响

时间:2014-11-14 06:43:10

标签: r statistics

以下是我数据框的摘录,代表纵向研究的结果(A是在两个时间点测量的结果参数):

 wide<-structure(list(ID = c(9000296L, 9001104L, 9001400L, 9001695L, 
 9001897L, 9002316L), BMI = c(29.8, 30.7, 23.5, 28.6, 25.9, 
 25.1),B.1 = c(100, 70.83, 100, 89.29, 100, 92.86), A.5 = c(100, 
 NA, 92.86, NA, 100, 89.29)), .Names = c("ID", "BMI", "A.1", 
 "A.5"), class = "data.frame", row.names = c(2L, 5L, 6L, 
 7L, 8L, 10L))

           wide
         ID  BMI   A.1   A.5
 2  9000296 29.8 100.0 100.0
 5  9001104 30.7  70.8    NA
 6  9001400 23.5 100.0  92.9
 7  9001695 28.6  89.3    NA
 8  9001897 25.9 100.0 100.0
10 9002316 25.1  92.9  89.3

正如您所看到的,A1和A5之间存在相关性,因为它应该在纵向研究中:

 library (psych)


    corr.test (wide [,c(3,4)] )
    Call:corr.test(x = wide[, c(3, 4)])
 Correlation matrix 
      A.1  A.5
 A.1 1.00 0.78
 A.5 0.78 1.00

然后我将数据转换为长格式

    long<- reshape (wide, varying = c(3,4), direction="long")
   long
          ID  BMI time     A id
 1.1 9000296 29.8    1 100.0  1
 2.1 9001104 30.7    1  70.8  2
 3.1 9001400 23.5    1 100.0  3
 4.1 9001695 28.6    1  89.3  4
 5.1 9001897 25.9    1 100.0  5
 6.1 9002316 25.1    1  92.9  6
 1.5 9000296 29.8    5 100.0  1
 2.5 9001104 30.7    5    NA  2
 3.5 9001400 23.5    5  92.9  3
 4.5 9001695 28.6    5    NA  4
 5.5 9001897 25.9    5 100.0  5
 6.5 9002316 25.1    5  89.3  6

然后我尝试首先使用独立的相关结构拟合gee模型:

     library (gee)
     model1<- gee(A~time+BMI, id=ID, corstr= "independence", data = long)
 Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
 running glm to get initial regression estimate
 (Intercept)        time         BMI 
     122.389       0.508      -1.127 

  summary (model1)

  GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
  gee S-function, version 4.13 modified 98/01/27 (1998) 

 Model:
  Link:                      Identity 
 Variance to Mean Relation: Gaussian 
 Correlation Structure:     Independent 

Call:
gee(formula = A ~ time + BMI, id = ID, data = long, corstr = "independence")

Summary of Residuals:
   Min     1Q Median     3Q    Max 
-17.46  -4.62   1.11   5.79  10.69 


Coefficients:
            Estimate Naive S.E. Naive z Robust S.E. Robust z
(Intercept)  122.389      34.18   3.580       31.00    3.949
time           0.508       1.60   0.317        1.12    0.453
BMI           -1.127       1.23  -0.919        1.23   -0.913

Estimated Scale Parameter:  93.6
Number of Iterations:  1

 Working Correlation
       [,1] [,2]
  [1,]    1    0
  [2,]    0    0

使用可交换的相关结构:

      model2<- gee(A~time+BMI, id=ID, corstr= "exchangeable", data = long)
  Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
  running glm to get initial regression estimate
  (Intercept)        time         BMI 
   122.389       0.508      -1.127 

   summary (model2)

   GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
   gee S-function, version 4.13 modified 98/01/27 (1998) 

  Model:
   Link:                      Identity 
   Variance to Mean Relation: Gaussian 
   Correlation Structure:     Exchangeable 

  Call:
  gee(formula = A ~ time + BMI, id = ID, data = long, corstr = "exchangeable")

  Summary of Residuals:
    Min     1Q Median     3Q    Max 
  -17.46  -4.62   1.11   5.79  10.69 

  Coefficients:
             Estimate Naive S.E. Naive z Robust S.E. Robust z
  (Intercept)  122.389      34.18   3.580       31.00    3.949
  time           0.508       1.60   0.317        1.12    0.453
  BMI           -1.127       1.23  -0.919        1.23   -0.913

  Estimated Scale Parameter:  93.6
  Number of Iterations:  1

Working Correlation
      [,1] [,2]
 [1,]    1    0
 [2,]    0    0

如您所见,尽管在gee模型中使用了不同的相关结构,但输出是相同的。在两种情况下,相关矩阵中的相关性为零。

在我的实际数据中,我有更多的观察和时间点,但也存在重要的主体内相关性。然而,所有gee模型(也使用不同的因变量)在它们的相关矩阵中也没有相关性,并且改变corstr参数不会导致模型输出的变化。 这一切看起来都很奇怪。 你能告诉我做错了吗?

1 个答案:

答案 0 :(得分:0)

我找到了解决方案! ID变量应该排序!

  long<-long [order(long.p$ID),]



        model1<- gee(A~time+BMI, id=ID, corstr= "independence", data = long)
       Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
      running glm to get initial regression estimate
      (Intercept)        time         BMI 
        122.389       0.508      -1.127 

        model2<- gee(A~time+BMI, id=ID, corstr= "exchangeable", data = long)
        Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
      running glm to get initial regression estimate
       (Intercept)        time         BMI 
          122.389       0.508      -1.127 
        Warning message:
        In gee(A ~ time + BMI, id = ID, corstr = "exchangeable", data = long) :
         Working correlation estimate not positive definite




> model1

GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
gee S-function, version 4.13 modified 98/01/27 (1998) 

Model:
Link:                      Identity 
Variance to Mean Relation: Gaussian 
Correlation Structure:     Independent 

Call:
gee(formula = A ~ time + BMI, id = ID, data = long, corstr = "independence")

  Number of observations :  10 

Maximum cluster size   :  2 


  Coefficients:
 (Intercept)        time         BMI 
122.389       0.508      -1.127 

  Estimated Scale Parameter:  93.6
  Number of Iterations:  1

  Working Correlation[1:4,1:4]
       [,1] [,2]
       [1,]    1    0
       [2,]    0    1


          Returned Error Value:
        [1] 0
          model2

        GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
       gee S-function, version 4.13 modified 98/01/27 (1998) 

      Model:
       Link:                      Identity 
          Variance to Mean Relation: Gaussian 
       Correlation Structure:     Exchangeable 

      Call:
  gee(formula = A ~ time + BMI, id = ID, data = long, corstr = "exchangeable")

   Number of observations :  10 

  Maximum cluster size   :  2 


   Coefficients:
 (Intercept)        time         BMI 
 180.00       -1.70       -3.16 

 Estimated Scale Parameter:  154
 Number of Iterations:  5

 Working Correlation[1:4,1:4]
      [,1] [,2]
  [1,]  1.0  2.8
  [2,]  2.8  1.0


  Returned Error Value:
  [1] 1000