二项式时间GAMM不收敛(R :: mgcv)

时间:2015-06-04 22:58:02

标签: r mixed-models gam mgcv

我是混合效果和添加模型的新手,所以如果这里的答案很简单,我很抱歉。

我收集了几种代谢化学物质(M1,M2 ......),协变量(时间,种族,性别......)和疾病状态(D,D.binary)的数据。我正在尝试根据从GEE变量选择中选择的变量生成GAMM。

数据:

  • 8例,51个匹配对照
  • 每个主题大约10个时间点
  • ~630观察
  • M1,M2 ...... M3是代谢物,其中许多是由共同部分形成的,代谢物水平是相关的,因为它们竞争相同的组成部分
  • 协变量将受试者分为亚组

这是我现在的模型:

> b = gamm(D.binary ~ Time  + s(M1)  , 
      random = list(ParticipantID = ~ 1 + Time),  niterPQL=50,
      data = NEC_data, family=binomial(link="logit")) 

 Maximum number of PQL iterations:  50 
 iteration 1
 iteration 2
 ...
 iteration 49
 iteration 50
 Warning message:
 In gammPQL(y ~ X - 1, random = rand, data = strip.offset(mf), family = family,  :
  gamm not converged, try increasing niterPQL

> plot(b$gam,pages=1)

enter image description here

> summary(b$lme) # details of underlying lme fit
Linear mixed-effects model fit by maximum likelihood
 Data: data 
   AIC  BIC logLik
  -160 -124     88

Random effects:
 Formula: ~Xr - 1 | g
 Structure: pdIdnot
          Xr1   Xr2   Xr3   Xr4   Xr5   Xr6   Xr7   Xr8
StdDev: 0.812 0.812 0.812 0.812 0.812 0.812 0.812 0.812

 Formula: ~1 + Time | ParticipantID %in% g
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev  Corr  
(Intercept) 5.68324 (Intr)
Time        0.50739 -0.92 
Residual    0.00691       

Variance function:
 Structure: fixed weights
 Formula: ~invwt 
Fixed effects: list(fixed) 
                   Value Std.Error  DF t-value p-value
X(Intercept)       -2.81     0.729 573   -3.86  0.0001
XTime              -0.15     0.065 573   -2.30  0.0220
Xs(M1)Fx1 -1.60     0.066 573  -24.29  0.0000
 Correlation: 
                   X(Int) XTime  
XTime              -0.920       
Xs(M1)Fx1  0.004  0.000

Standardized Within-Group Residuals:
    Min      Q1     Med      Q3     Max 
-2.3472 -0.0692 -0.0117  0.0305 20.7271 

Number of Observations: 636
Number of Groups: 
                   g ParticipantID %in% g 
                   1                   61 
> summary(b$gam) # gam style summary of fitted model

Family: binomial 
Link function: logit 

Formula:
NEC.binary ~ Time + s(M1)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -2.8135     0.7289   -3.86  0.00013 ***
Time         -0.1495     0.0651   -2.30  0.02188 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
               edf Ref.df     F p-value    
s(M1) 4.1    4.1 14913  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.0872   
  Scale est. = 4.7815e-05  n = 636
> anova(b$gam) 

Family: binomial 
Link function: logit 

Formula:
NEC.binary ~ Time + s(M1)

Parametric Terms:
    df    F p-value
Time  1 5.28   0.022

Approximate significance of smooth terms:
               edf Ref.df     F p-value
s(M1) 4.1    4.1 14913  <2e-16
> gam.check(b$gam)

enter image description here

我怀疑我可能搞砸了一些相当基本的东西,因为M1是疾病状态最明显的鉴别者。它很重要(应该如此),但相关性非常低。而且,显然,模型没有收敛(即使我将迭代次数从20-> 50增加)。最后,检查图看起来非常离谱

问题

我是否犯了基本语法错误?在我的模型中是否有一些恶意组件我在看?任何帮助将不胜感激。

进一步的工作

我想在模型和2个协变量(Birthweight和Race)中添加另一种代谢物(M2)。当我将M2添加到模型时,我得到一个非收敛错误:

> b = gamm(D.binary ~ Time  + s(M1) + s(M2) , 
      random = list(ParticipantID = ~ 1 + Time),  niterPQL=20, correlation = corLin(),
      data = NEC_data, family=binomial(link="logit"))

 Maximum number of PQL iterations:  20 
iteration 1
iteration 2
Error in lme.formula(fixed = fixed, random = random, data = data, correlation = correlation,  : 
  nlminb problem, convergence error code = 1
  message = false convergence (8)

关于进入多维空间的任何建议也将受到赞赏。

加成

我也尝试了这个模型的离散疾病分类(对照:0,1疾病:2,3)和泊松噪声。

> b = gamm(NEC ~ DPP  + s(DSLNT_ug.mL)  , 
+      random = list(ParticipantID = ~ 1 + DPP),  niterPQL=20,
+      data = NEC_data, family=poisson) 

 Maximum number of PQL iterations:  20 
iteration 1
iteration 2
...
iteration 19
iteration 20
Error in solve.default(pdMatrix(a, factor = TRUE)) : 
  system is computationally singular: reciprocal condition number = 3.13906e-19

0 个答案:

没有答案