如何指定具有连续因素相互作用和随机效应的分层游戏

时间:2019-11-27 15:57:04

标签: r lme4 gam

我正在追踪Pedersen 2019,以mgcv::gam来构建分级游戏模型。我想在具有连续因素相互作用和随机效应的情况下(本文未涉及的情况)实现他的“ GS”和“ GI”模型。

对于“ GS”模型,Pedersen建议使用平滑因子(“ fs”)进行随机效果。我要在连续平滑中添加一个“按”项,并为连续术语添加一个单独的平滑以获取“全局”平滑。我的问题是:

  1. 这是合法的吗?还是我可能以此方式破坏模型结构?
  2. s(log(indep_var), by = fac_var, bs = "tp", m = 1)项是否为零?为了将“全局”平滑s(log(indep_var), bs = "tp")用于没有fac_var的均值预测,必须是这种情况。

预先感谢您的任何想法。

library(mgcv)
library(lme4)
library(dplyr)

fakedata = data.frame(idx = 1:1000) %>% 
    mutate(indep_var = runif(1000, min = 0, max = 1),
           fac_var = factor(rep(letters[1:5], 200)),
           rand_eff = factor(sample(LETTERS[11:20], 1000, replace = T)),
           dep_var = (indep_var + rnorm(1000, sd = 0.1))^3 + scale(as.numeric(fac_var), center = .1, scale = .1),
           dep_var = ifelse(dep_var < 0, 0.1, dep_var))

# Here is the equivalent of what I would like to model in (transformed) linear space
glmer_mod = glmer(dep_var ~ log(indep_var) * fac_var + (1 + log(indep_var) | rand_eff), family = Gamma(link = "log"), data = fakedata)

# Here is my attempt to do that
hgam = gam(dep_var ~ 
               s(log(indep_var), bs = "tp") +
               s(log(indep_var), by = fac_var, bs = "tp", m = 1) +
               s(log(indep_var), rand_eff, bs = "fs", m = 2),
           data = fakedata, method = "REML",
           family = Gamma(link = "log")
           )

输出:

> summary(glmer_mod)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 Family: Gamma  ( log )
Formula: dep_var ~ log(indep_var) * fac_var + (1 + log(indep_var) | rand_eff)
   Data: fakedata

     AIC      BIC   logLik deviance df.resid 
   786.3    855.0   -379.2    758.3      986 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.3588 -0.4314 -0.1515  0.2785  6.8704 

Random effects:
 Groups   Name           Variance  Std.Dev. Corr
 rand_eff (Intercept)    6.924e-06 0.002631     
          log(indep_var) 4.748e-06 0.002179 0.97
 Residual                2.009e-04 0.014175     
Number of obs: 1000, groups:  rand_eff, 10

Fixed effects:
                         Estimate Std. Error  t value Pr(>|z|)    
(Intercept)              2.246990   0.002155 1042.486  < 2e-16 ***
log(indep_var)           0.019734   0.001668   11.832  < 2e-16 ***
fac_varb                 0.721068   0.001998  360.929  < 2e-16 ***
fac_varc                 1.136721   0.002002  567.918  < 2e-16 ***
fac_vard                 1.431108   0.002022  707.839  < 2e-16 ***
fac_vare                 1.654642   0.001985  833.743  < 2e-16 ***
log(indep_var):fac_varb -0.008291   0.001412   -5.871 4.33e-09 ***
log(indep_var):fac_varc -0.011700   0.001505   -7.774 7.59e-15 ***
log(indep_var):fac_vard -0.011345   0.001546   -7.336 2.20e-13 ***
log(indep_var):fac_vare -0.014750   0.001447  -10.196  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
                  (Intr) lg(n_) fc_vrb fc_vrc fc_vrd fac_vr lg(ndp_vr):fc_vrb lg(ndp_vr):fc_vrc lg(ndp_vr):fc_vrd
log(ndp_vr)        0.874                                                                                         
fac_varb          -0.435 -0.256                                                                                  
fac_varc          -0.444 -0.273  0.475                                                                           
fac_vard          -0.430 -0.253  0.460  0.462                                                                    
fac_vare          -0.441 -0.263  0.470  0.476  0.462                                                             
lg(ndp_vr):fc_vrb -0.277 -0.344  0.712  0.312  0.288  0.298                                                      
lg(ndp_vr):fc_vrc -0.279 -0.353  0.293  0.711  0.280  0.294  0.401                                               
lg(ndp_vr):fc_vrd -0.255 -0.315  0.267  0.273  0.720  0.272  0.352             0.350                             
log(ndp_vr):fc_vr -0.277 -0.346  0.298  0.305  0.287  0.706  0.404             0.395             0.356           
convergence code: 0
Model failed to converge with max|grad| = 0.00762269 (tol = 0.001, component 1)
Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?

> summary(hgam)

Family: Gamma 
Link function: log 

Formula:
dep_var ~ s(log(indep_var), bs = "tp") + s(log(indep_var), 
    by = fac_var, bs = "tp", m = 1) + s(log(indep_var), 
    rand_eff, bs = "fs", m = 2)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.37655    0.01514     223   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                                 edf Ref.df     F  p-value    
s(log(indep_var))          1.0003176  1.001 1.644   0.2001    
s(log(indep_var)):fac_vara 6.9879213  8.000 6.003 9.78e-09 ***
s(log(indep_var)):fac_varb 0.0012384  8.000 0.000   1.0000    
s(log(indep_var)):fac_varc 0.0005525  8.000 0.000   1.0000    
s(log(indep_var)):fac_vard 0.0005005  8.000 0.000   0.7197    
s(log(indep_var)):fac_vare 2.9105641  8.000 0.827   0.0429 *  
s(log(indep_var),rand_eff) 0.0024828 98.000 0.000   0.9488    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  -0.0185   Deviance explained = 3.96%
-REML = 4070.2  Scale est. = 0.22652   n = 1000

0 个答案:

没有答案