使用泊松误差结构拟合glmer时出错

时间:2015-06-14 19:30:44

标签: r

我希望有人可以帮助我。我正在尝试进行一项分析,该分析检查了在高程梯度下捕获的膜翅目样本的数量。我想研究与海拔有关的单模态分布的可能性,以及线性分布。因此,我将I(Altitude^2)作为解释变量包括在分析中。

我正在尝试运行以下模型,其中包括Poisson错误结构(我们处理计数数据)和日期和陷阱类型(Trap)作为随机效应。

model7 <- glmer(No.Specimens~Altitude+I(Altitude^2)+(1|Date)+(1|Trap),
       family="poisson",data=Santa.Lucia,na.action=na.omit)

但是我一直收到以下错误消息:

Error: (maxstephalfit) PIRLS step-halvings failed to reduce deviance in pwrssUpdate
In addition: Warning messages:
1: Some predictor variables are on very different scales: consider rescaling 
2: In pwrssUpdate(pp, resp, tolPwrss, GQmat, compDev, fac, verbose) :
  Cholmod warning 'not positive definite' at file:../Cholesky/t_cholmod_rowfac.c, line 431
3: In pwrssUpdate(pp, resp, tolPwrss, GQmat, compDev, fac, verbose) :
  Cholmod warning 'not positive definite' at file:../Cholesky/t_cholmod_rowfac.c, line 431
显然,我犯了一些重大错误。任何人都可以帮我弄清楚我哪里出错了吗?

以下是数据帧的结构:

str(Santa.Lucia)
'data.frame':   97 obs. of  6 variables:
 $ Date        : Factor w/ 8 levels "01-Sep-2014",..: 6 6 6 6 6 6 6 6 6 6 ...
 $ Trap.No     : Factor w/ 85 levels "N1","N10","N11",..: 23 48 51 14 17 20 24 27 30 33 ...
 $ Altitude    : int  1558 1635 1703 1771 1840 1929 1990 2047 2112 2193 ...
 $ Trail       : Factor w/ 3 levels "Cascadas","Limones",..: 1 1 1 1 1 3 3 3 3 3 ...
 $ No.Specimens: int  1 0 2 2 3 4 5 0 1 1 ...
 $ Trap        : Factor w/ 2 levels "Net","Pan": 2 2 2 2 2 2 2 2 2 2 ...

这是完整的data.set(这些只是我的初步分析)

           Date Trap.No Altitude    Trail No.Specimens Trap
1   28-Aug-2014      W2     1558 Cascadas            1  Pan
2   28-Aug-2014      W5     1635 Cascadas            0  Pan
3   28-Aug-2014      W8     1703 Cascadas            2  Pan
4   28-Aug-2014     W11     1771 Cascadas            2  Pan
5   28-Aug-2014     W14     1840 Cascadas            3  Pan
6   28-Aug-2014     W17     1929    Tower            4  Pan
7   28-Aug-2014     W20     1990    Tower            5  Pan
8   28-Aug-2014     W23     2047    Tower            0  Pan
9   28-Aug-2014     W26     2112    Tower            1  Pan
10  28-Aug-2014     W29     2193    Tower            1  Pan
11  28-Aug-2014     W32     2255    Tower            0  Pan
12  30-Aug-2014      N1     1562 Cascadas            5  Net
13  30-Aug-2014      N2     1635 Cascadas            0  Net
14  30-Aug-2014      N3     1723 Cascadas            2  Net
15  30-Aug-2014      N4     1779 Cascadas            0  Net
16  30-Aug-2014      N5     1842 Cascadas            3  Net
17  30-Aug-2014      N6     1924    Tower            2  Net
18  30-Aug-2014      N7     1979    Tower            2  Net
19  30-Aug-2014      N8     2046    Tower            0  Net
20  30-Aug-2014      N9     2110    Tower            0  Net
21  30-Aug-2014     N10     2185    Tower            0  Net
22  30-Aug-2014     N11     2241    Tower            0  Net
23  31-Aug-2014      N1     1562 Cascadas            1  Net
24  31-Aug-2014      N2     1635 Cascadas            1  Net
25  31-Aug-2014      N3     1723 Cascadas            0  Net
26  31-Aug-2014      N4     1779 Cascadas            0  Net
27  31-Aug-2014      N5     1842 Cascadas            0  Net
28  31-Aug-2014      N6     1924    Tower            0  Net
29  31-Aug-2014      N7     1979    Tower            7  Net
30  31-Aug-2014      N8     2046    Tower            4  Net
31  31-Aug-2014      N9     2110    Tower            6  Net
32  31-Aug-2014     N10     2185    Tower            1  Net
33  31-Aug-2014     N11     2241    Tower            1  Net
34  01-Sep-2014      W1     1539 Cascadas            0  Pan
35  01-Sep-2014      W2     1558 Cascadas            0  Pan
36  01-Sep-2014      W3     1585 Cascadas            2  Pan
37  01-Sep-2014      W4     1604 Cascadas            0  Pan
38  01-Sep-2014      W5     1623 Cascadas            1  Pan
39  01-Sep-2014      W6     1666 Cascadas            4  Pan
40  01-Sep-2014      W7     1699 Cascadas            0  Pan
41  01-Sep-2014      W8     1703 Cascadas            0  Pan
42  01-Sep-2014      W9     1746 Cascadas            1  Pan
43  01-Sep-2014     W10     1762 Cascadas            0  Pan
44  01-Sep-2014     W11     1771 Cascadas            0  Pan
45  01-Sep-2014     W12     1796 Cascadas            1  Pan
46  01-Sep-2014     W13     1825 Cascadas            0  Pan
47  01-Sep-2014     W14     1840    Tower            4  Pan
48  01-Sep-2014     W15     1859    Tower            2  Pan
49  01-Sep-2014     W16     1889    Tower            2  Pan
50  01-Sep-2014     W17     1929    Tower            0  Pan
51  01-Sep-2014     W18     1956    Tower            0  Pan
52  01-Sep-2014     W19     1990    Tower            1  Pan
53  01-Sep-2014     W20     2002    Tower            3  Pan
54  01-Sep-2014     W21     2023    Tower            2  Pan
55  01-Sep-2014     W22     2047    Tower            0  Pan
56  01-Sep-2014     W23     2068    Tower            1  Pan
57  01-Sep-2014     W24     2084    Tower            0  Pan
58  01-Sep-2014     W25     2112    Tower            1  Pan
59  01-Sep-2014     W26     2136    Tower            0  Pan
60  01-Sep-2014     W27     2150    Tower            1  Pan
61  01-Sep-2014     W28     2193    Tower            1  Pan
62  01-Sep-2014     W29     2219    Tower            0  Pan
63  01-Sep-2014     W30     2227    Tower            1  Pan
64  01-Sep-2014     W31     2255    Tower            0  Pan
85   03/06/2015    WT47     1901    Tower            2  Pan
86   03/06/2015    WT48     1938    Tower            2  Pan
87   03/06/2015    WT49     1963    Tower            2  Pan
88   03/06/2015    WT50     1986    Tower            0  Pan
89   03/06/2015    WT51     2012    Tower            9  Pan
90   03/06/2015    WT52     2033    Tower            0  Pan
91   03/06/2015    WT53     2050    Tower            4  Pan
92   03/06/2015    WT54     2081    Tower            2  Pan
93   03/06/2015    WT55     2107    Tower            1  Pan
94   03/06/2015    WT56     2128    Tower            4  Pan
95   03/06/2015    WT57     2155    Tower            0  Pan
96   03/06/2015    WT58     2179    Tower            2  Pan
97   03/06/2015    WT59     2214    Tower            0  Pan
98   03/06/2015    WT60     2233    Tower            0  Pan
99   03/06/2015    WT61     2261    Tower            0  Pan
100  03/06/2015    WT62     2278    Tower            0  Pan
101  03/06/2015    WT63     2300    Tower            0  Pan
102  04/06/2015    WT31     1497 Cascadas            0  Pan
103  04/06/2015    WT32     1544 Cascadas            1  Pan
104  04/06/2015    WT33     1568 Cascadas            1  Pan
105  04/06/2015    WT34     1574 Cascadas            0  Pan
106  04/06/2015    WT35     1608 Cascadas            5  Pan
107  04/06/2015    WT36     1630 Cascadas            3  Pan
108  04/06/2015    WT37     1642 Cascadas            0  Pan
109  04/06/2015    WT38     1672 Cascadas            5  Pan
110  04/06/2015    WT39     1685 Cascadas            6  Pan
111  04/06/2015    WT40     1723 Cascadas            3  Pan
112  04/06/2015    WT41     1744 Cascadas            2  Pan
113  04/06/2015    WT42     1781 Cascadas            1  Pan
114  04/06/2015    WT43     1794 Cascadas            2  Pan
115  04/06/2015    WT44     1833 Cascadas            0  Pan
116  04/06/2015    WT45     1855 Cascadas            4  Pan
117  04/06/2015    WT46     1876 Cascadas            2  Pan           

3 个答案:

答案 0 :(得分:2)

问题几乎肯定是由于您将字符向量传递给data参数:

..., data="Santa.Lucia, ..."

?glmer表示data参数应为:

data: an optional data frame containing the variables named in
      ‘formula’.  By default the variables are taken from the
      environment from which ‘lmer’ is called. While ‘data’ is
      optional, the package authors _strongly_ recommend its use,
      especially when later applying methods such as ‘update’ and
      ‘drop1’ to the fitted model (_such methods are not guaranteed
      to work properly if ‘data’ is omitted_). If ‘data’ is
      omitted, variables will be taken from the environment of
      ‘formula’ (if specified as a formula) or from the parent
      frame (if specified as a character vector).

括号中的最后一部分,“如果指定为字符向量”涉及如果formula的规范是字符向量而不指定{{1}作为一个角色。

更正您的电话以包含data,您应该好好去。

答案 1 :(得分:1)

您已设法为Date使用两种不同的格式。这是一个修复:

Santa.Lucia$Date2 <- ifelse(nchar(as.character(Santa.Lucia$Date)) > 10,  
                             as.Date(Santa.Lucia$Date, format="%d-%b-%Y"), 
                             as.Date(Santa.Lucia$Date, format="%d/%m/%Y") )

我尝试了一个更简单的模型:

( model6 <-glmer(No.Specimens~Altitude+(1|Date2)+(1|Trap),family="poisson",data=Santa.Lucia,na.action=na.omit) )
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [
glmerMod]
 Family: poisson  ( log )
Formula: No.Specimens ~ Altitude + (1 | Date2) + (1 | Trap)
   Data: Santa.Lucia
      AIC       BIC    logLik  deviance  df.resid 
 368.6522  378.9510 -180.3261  360.6522        93 
Random effects:
 Groups Name        Std.Dev.
 Date2  (Intercept) 0.2248  
 Trap   (Intercept) 0.0000  
Number of obs: 97, groups:  Date2, 6; Trap, 2
Fixed Effects:
(Intercept)     Altitude  
  1.3696125   -0.0004992  
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.0516296 (tol = 0.001, component 3)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables?

我实际上能够在没有错误或警告的情况下运行我建议的修改,但我认为使用这两个分组是不正确的,因为一个预测另一个:

> table(Santa.Lucia$Date2, Santa.Lucia$Trap)

        Net Pan
  16310   0  11
  16312  11   0
  16313  11   0
  16314   0  31
  16589   0  17
  16590   0  16

这就是你不收敛的原因。这不是错误模型,而是设计和数据收集中的病理。我怀疑你是否真的有足够的数据来支持混合模型:

( model5 <-glm(No.Specimens~Altitude,family="poisson",data=Santa.Lucia,na.action=na.omit) )

Call:  glm(formula = No.Specimens ~ Altitude, family = "poisson", data = Santa.Lucia, 
    na.action = na.omit)

Coefficients:
(Intercept)     Altitude  
  1.4218234   -0.0005391  

Degrees of Freedom: 96 Total (i.e. Null);  95 Residual
Null Deviance:      215.3 
Residual Deviance: 213.2    AIC: 368.6

与二次高度模型进行比较:

( model5.2 <-glm(No.Specimens~poly(Altitude,2),family="poisson",data=Santa.Lucia,na.action=na.omit) )

Call:  glm(formula = No.Specimens ~ poly(Altitude, 2), family = "poisson", 
    data = Santa.Lucia, na.action = na.omit)

Coefficients:
       (Intercept)  poly(Altitude, 2)1  poly(Altitude, 2)2  
            0.3188             -1.7116             -3.9539  

Degrees of Freedom: 96 Total (i.e. Null);  94 Residual
Null Deviance:      215.3 
Residual Deviance: 194.6    AIC: 352
> anova(model5.2)
Analysis of Deviance Table

Model: poisson, link: log

Response: No.Specimens

Terms added sequentially (first to last)


                  Df Deviance Resid. Df Resid. Dev
NULL                                 96     215.31
poly(Altitude, 2)  2   20.698        94     194.61
> anova(model5.2, model5)
Analysis of Deviance Table

Model 1: No.Specimens ~ poly(Altitude, 2)
Model 2: No.Specimens ~ Altitude
  Resid. Df Resid. Dev Df Deviance
1        94     194.61            
2        95     213.20 -1   -18.59

答案 2 :(得分:1)

你几乎就在那里。正如@BondedDust所说,它不实用 使用两级因子(Trap)作为随机效应;事实上, 它原则上似乎也不正确(Trap的等级 不是随意/随意选择/可交换的)。当我尝试一个模型 具有二次高度,陷阱的固定效应和随机效应 Date,我被警告说我可能想重新调整参数:

Some predictor variables are on very different scales: consider rescaling 

(您看到此警告与您的错误消息混在一起)。唯一的连续(因此值得重新调整)预测器是Altitude,所以我用scale()集中并缩放它(唯一的缺点是它改变了系数的定量解释,但模型本身实际上是相同)。我还添加了观察级随机效应以允许过度离散。

结果似乎没问题,并同意图片。

library(lme4)
Santa.Lucia <- transform(Santa.Lucia,
                         scAlt=scale(Altitude),
                         obs=factor(seq(nrow(Santa.Lucia))))
model7 <- glmer(No.Specimens~scAlt+I(scAlt^2)+Trap+(1|Date)+(1|obs),
                family="poisson",data=Santa.Lucia,na.action=na.omit)

summary(model7)

## Random effects:
##  Groups Name        Variance Std.Dev.
##  obs    (Intercept) 0.64712  0.8044  
##  Date   (Intercept) 0.02029  0.1425  
## Number of obs: 97, groups:  obs, 97; Date, 6
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  0.53166    0.31556   1.685  0.09202 . 
## scAlt       -0.22867    0.14898  -1.535  0.12480   
## I(scAlt^2)  -0.52840    0.16355  -3.231  0.00123 **
## TrapPan     -0.01853    0.32487  -0.057  0.95451   

通过与缺乏它的模型进行比较来测试二次项...

model7R <- update(model7, . ~ . - I(scAlt^2))
## convergence warning, but probably OK ...
anova(model7,model7R)

原则上,可能值得研究二次高度模型和陷阱之间的相互作用(通过陷阱类型允许不同的高度趋势),但图片显示它不会做太多......

library(ggplot2); theme_set(theme_bw())
ggplot(Santa.Lucia,aes(Altitude,No.Specimens,colour=Trap))+
    stat_sum(aes(size=factor(..n..)))+
        scale_size_discrete(range=c(2,4))+
            geom_line(aes(group=Date),colour="gray",alpha=0.3)+
                geom_smooth(method="gam",family="quasipoisson",
                            formula=y~poly(x,2))+
                    geom_smooth(method="gam",family="quasipoisson",
                                formula=y~poly(x,2),se=FALSE,
                                aes(group=1),colour="black")

enter image description here