我正在追踪Pedersen 2019,以mgcv::gam
来构建分级游戏模型。我想在具有连续因素相互作用和随机效应的情况下(本文未涉及的情况)实现他的“ GS”和“ GI”模型。
对于“ GS”模型,Pedersen建议使用平滑因子(“ fs”)进行随机效果。我要在连续平滑中添加一个“按”项,并为连续术语添加一个单独的平滑以获取“全局”平滑。我的问题是:
s(log(indep_var), by = fac_var, bs = "tp", m = 1)
项是否为零?为了将“全局”平滑s(log(indep_var), bs = "tp")
用于没有fac_var
的均值预测,必须是这种情况。预先感谢您的任何想法。
library(mgcv)
library(lme4)
library(dplyr)
fakedata = data.frame(idx = 1:1000) %>%
mutate(indep_var = runif(1000, min = 0, max = 1),
fac_var = factor(rep(letters[1:5], 200)),
rand_eff = factor(sample(LETTERS[11:20], 1000, replace = T)),
dep_var = (indep_var + rnorm(1000, sd = 0.1))^3 + scale(as.numeric(fac_var), center = .1, scale = .1),
dep_var = ifelse(dep_var < 0, 0.1, dep_var))
# Here is the equivalent of what I would like to model in (transformed) linear space
glmer_mod = glmer(dep_var ~ log(indep_var) * fac_var + (1 + log(indep_var) | rand_eff), family = Gamma(link = "log"), data = fakedata)
# Here is my attempt to do that
hgam = gam(dep_var ~
s(log(indep_var), bs = "tp") +
s(log(indep_var), by = fac_var, bs = "tp", m = 1) +
s(log(indep_var), rand_eff, bs = "fs", m = 2),
data = fakedata, method = "REML",
family = Gamma(link = "log")
)
输出:
> summary(glmer_mod)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: Gamma ( log )
Formula: dep_var ~ log(indep_var) * fac_var + (1 + log(indep_var) | rand_eff)
Data: fakedata
AIC BIC logLik deviance df.resid
786.3 855.0 -379.2 758.3 986
Scaled residuals:
Min 1Q Median 3Q Max
-2.3588 -0.4314 -0.1515 0.2785 6.8704
Random effects:
Groups Name Variance Std.Dev. Corr
rand_eff (Intercept) 6.924e-06 0.002631
log(indep_var) 4.748e-06 0.002179 0.97
Residual 2.009e-04 0.014175
Number of obs: 1000, groups: rand_eff, 10
Fixed effects:
Estimate Std. Error t value Pr(>|z|)
(Intercept) 2.246990 0.002155 1042.486 < 2e-16 ***
log(indep_var) 0.019734 0.001668 11.832 < 2e-16 ***
fac_varb 0.721068 0.001998 360.929 < 2e-16 ***
fac_varc 1.136721 0.002002 567.918 < 2e-16 ***
fac_vard 1.431108 0.002022 707.839 < 2e-16 ***
fac_vare 1.654642 0.001985 833.743 < 2e-16 ***
log(indep_var):fac_varb -0.008291 0.001412 -5.871 4.33e-09 ***
log(indep_var):fac_varc -0.011700 0.001505 -7.774 7.59e-15 ***
log(indep_var):fac_vard -0.011345 0.001546 -7.336 2.20e-13 ***
log(indep_var):fac_vare -0.014750 0.001447 -10.196 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) lg(n_) fc_vrb fc_vrc fc_vrd fac_vr lg(ndp_vr):fc_vrb lg(ndp_vr):fc_vrc lg(ndp_vr):fc_vrd
log(ndp_vr) 0.874
fac_varb -0.435 -0.256
fac_varc -0.444 -0.273 0.475
fac_vard -0.430 -0.253 0.460 0.462
fac_vare -0.441 -0.263 0.470 0.476 0.462
lg(ndp_vr):fc_vrb -0.277 -0.344 0.712 0.312 0.288 0.298
lg(ndp_vr):fc_vrc -0.279 -0.353 0.293 0.711 0.280 0.294 0.401
lg(ndp_vr):fc_vrd -0.255 -0.315 0.267 0.273 0.720 0.272 0.352 0.350
log(ndp_vr):fc_vr -0.277 -0.346 0.298 0.305 0.287 0.706 0.404 0.395 0.356
convergence code: 0
Model failed to converge with max|grad| = 0.00762269 (tol = 0.001, component 1)
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
> summary(hgam)
Family: Gamma
Link function: log
Formula:
dep_var ~ s(log(indep_var), bs = "tp") + s(log(indep_var),
by = fac_var, bs = "tp", m = 1) + s(log(indep_var),
rand_eff, bs = "fs", m = 2)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.37655 0.01514 223 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(log(indep_var)) 1.0003176 1.001 1.644 0.2001
s(log(indep_var)):fac_vara 6.9879213 8.000 6.003 9.78e-09 ***
s(log(indep_var)):fac_varb 0.0012384 8.000 0.000 1.0000
s(log(indep_var)):fac_varc 0.0005525 8.000 0.000 1.0000
s(log(indep_var)):fac_vard 0.0005005 8.000 0.000 0.7197
s(log(indep_var)):fac_vare 2.9105641 8.000 0.827 0.0429 *
s(log(indep_var),rand_eff) 0.0024828 98.000 0.000 0.9488
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = -0.0185 Deviance explained = 3.96%
-REML = 4070.2 Scale est. = 0.22652 n = 1000