以下是我数据框的摘录,代表纵向研究的结果(A是在两个时间点测量的结果参数):
wide<-structure(list(ID = c(9000296L, 9001104L, 9001400L, 9001695L,
9001897L, 9002316L), BMI = c(29.8, 30.7, 23.5, 28.6, 25.9,
25.1),B.1 = c(100, 70.83, 100, 89.29, 100, 92.86), A.5 = c(100,
NA, 92.86, NA, 100, 89.29)), .Names = c("ID", "BMI", "A.1",
"A.5"), class = "data.frame", row.names = c(2L, 5L, 6L,
7L, 8L, 10L))
wide
ID BMI A.1 A.5
2 9000296 29.8 100.0 100.0
5 9001104 30.7 70.8 NA
6 9001400 23.5 100.0 92.9
7 9001695 28.6 89.3 NA
8 9001897 25.9 100.0 100.0
10 9002316 25.1 92.9 89.3
正如您所看到的,A1和A5之间存在相关性,因为它应该在纵向研究中:
library (psych)
corr.test (wide [,c(3,4)] )
Call:corr.test(x = wide[, c(3, 4)])
Correlation matrix
A.1 A.5
A.1 1.00 0.78
A.5 0.78 1.00
然后我将数据转换为长格式
long<- reshape (wide, varying = c(3,4), direction="long")
long
ID BMI time A id
1.1 9000296 29.8 1 100.0 1
2.1 9001104 30.7 1 70.8 2
3.1 9001400 23.5 1 100.0 3
4.1 9001695 28.6 1 89.3 4
5.1 9001897 25.9 1 100.0 5
6.1 9002316 25.1 1 92.9 6
1.5 9000296 29.8 5 100.0 1
2.5 9001104 30.7 5 NA 2
3.5 9001400 23.5 5 92.9 3
4.5 9001695 28.6 5 NA 4
5.5 9001897 25.9 5 100.0 5
6.5 9002316 25.1 5 89.3 6
然后我尝试首先使用独立的相关结构拟合gee模型:
library (gee)
model1<- gee(A~time+BMI, id=ID, corstr= "independence", data = long)
Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
(Intercept) time BMI
122.389 0.508 -1.127
summary (model1)
GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
gee S-function, version 4.13 modified 98/01/27 (1998)
Model:
Link: Identity
Variance to Mean Relation: Gaussian
Correlation Structure: Independent
Call:
gee(formula = A ~ time + BMI, id = ID, data = long, corstr = "independence")
Summary of Residuals:
Min 1Q Median 3Q Max
-17.46 -4.62 1.11 5.79 10.69
Coefficients:
Estimate Naive S.E. Naive z Robust S.E. Robust z
(Intercept) 122.389 34.18 3.580 31.00 3.949
time 0.508 1.60 0.317 1.12 0.453
BMI -1.127 1.23 -0.919 1.23 -0.913
Estimated Scale Parameter: 93.6
Number of Iterations: 1
Working Correlation
[,1] [,2]
[1,] 1 0
[2,] 0 0
使用可交换的相关结构:
model2<- gee(A~time+BMI, id=ID, corstr= "exchangeable", data = long)
Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
(Intercept) time BMI
122.389 0.508 -1.127
summary (model2)
GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
gee S-function, version 4.13 modified 98/01/27 (1998)
Model:
Link: Identity
Variance to Mean Relation: Gaussian
Correlation Structure: Exchangeable
Call:
gee(formula = A ~ time + BMI, id = ID, data = long, corstr = "exchangeable")
Summary of Residuals:
Min 1Q Median 3Q Max
-17.46 -4.62 1.11 5.79 10.69
Coefficients:
Estimate Naive S.E. Naive z Robust S.E. Robust z
(Intercept) 122.389 34.18 3.580 31.00 3.949
time 0.508 1.60 0.317 1.12 0.453
BMI -1.127 1.23 -0.919 1.23 -0.913
Estimated Scale Parameter: 93.6
Number of Iterations: 1
Working Correlation
[,1] [,2]
[1,] 1 0
[2,] 0 0
如您所见,尽管在gee模型中使用了不同的相关结构,但输出是相同的。在两种情况下,相关矩阵中的相关性为零。
在我的实际数据中,我有更多的观察和时间点,但也存在重要的主体内相关性。然而,所有gee模型(也使用不同的因变量)在它们的相关矩阵中也没有相关性,并且改变corstr参数不会导致模型输出的变化。 这一切看起来都很奇怪。 你能告诉我做错了吗?
答案 0 :(得分:0)
我找到了解决方案! ID变量应该排序!
long<-long [order(long.p$ID),]
model1<- gee(A~time+BMI, id=ID, corstr= "independence", data = long)
Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
(Intercept) time BMI
122.389 0.508 -1.127
model2<- gee(A~time+BMI, id=ID, corstr= "exchangeable", data = long)
Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
(Intercept) time BMI
122.389 0.508 -1.127
Warning message:
In gee(A ~ time + BMI, id = ID, corstr = "exchangeable", data = long) :
Working correlation estimate not positive definite
> model1
GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
gee S-function, version 4.13 modified 98/01/27 (1998)
Model:
Link: Identity
Variance to Mean Relation: Gaussian
Correlation Structure: Independent
Call:
gee(formula = A ~ time + BMI, id = ID, data = long, corstr = "independence")
Number of observations : 10
Maximum cluster size : 2
Coefficients:
(Intercept) time BMI
122.389 0.508 -1.127
Estimated Scale Parameter: 93.6
Number of Iterations: 1
Working Correlation[1:4,1:4]
[,1] [,2]
[1,] 1 0
[2,] 0 1
Returned Error Value:
[1] 0
model2
GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
gee S-function, version 4.13 modified 98/01/27 (1998)
Model:
Link: Identity
Variance to Mean Relation: Gaussian
Correlation Structure: Exchangeable
Call:
gee(formula = A ~ time + BMI, id = ID, data = long, corstr = "exchangeable")
Number of observations : 10
Maximum cluster size : 2
Coefficients:
(Intercept) time BMI
180.00 -1.70 -3.16
Estimated Scale Parameter: 154
Number of Iterations: 5
Working Correlation[1:4,1:4]
[,1] [,2]
[1,] 1.0 2.8
[2,] 2.8 1.0
Returned Error Value:
[1] 1000