Question

我有两个符合beta（也称为betar）和Poisson个系列的回复，我正在研究使用beta和gamm拟合加性混合模型准家庭（计数数据过度分散）。

我知道我可以使用mgcv包中的beta函数来接受AIC和quassi-families，但是我认为它使用的是PQL，而{{1}报告对于比较模型没有用 - 这是我分析的主要目标。

在计数响应的情况下，我知道QAIC已被用于排名/比较过度分散的混合模型，但我找不到任何说明它适合过度分散的GAMM。

我理解这些可能是两个问题，但它们都有一个共同的主题模型选择与扩展的家庭，并可能有不同的解决方案。下面我为每个案例提供可重复的例子。

##generate data
library(gamm4)
library(mgcv)
dat <- gamSim(1,n=400,scale=2)
dat<-subset(dat, select=c(x0,x1,x2,x3,f) )
dat$g <- as.factor(sample(1:20,400,replace=TRUE))#random factor
dat$yb<-runif(400)#yb ranges between 0-1 hence fitted with beta family
dat$f <- dat$f + model.matrix(~ g-1)%*%rnorm(20)*2
dat$yp <- rpois(400,exp(dat$f/7))#y2 is counts hence poisson family

#beta family example with gamm function (this works - however not sure if the subsequent model comparisons are valid!) 
m1b<-    gamm(yb~s(x0)+s(x1)+s(x2)+s(x3),family=betar(link='logit'),data=dat,random=list(g=~1))
m2b<-gamm(yb~s(x1)+s(x2)+s(x3),family=betar(link='logit'),data=dat,random=list(g=~1))
m3b<-gamm(yb~s(x0)+s(x2)+s(x3),family=betar(link='logit'),data=dat,random=list(g=~1))

#AIC to compare models  
AIC(m1b,m2b,m3b)

#try the same using gamm4 (ideally)- it obviously fails with beta family.
m<-gamm4(yb~s(x0)+s(x1)+s(x2)+s(x3),family=betar(link='logit'),data=dat,random = ~ (1|g)) 

##Example with quassi family - yp response is overdispersed count data (may not be overdispered in this example
#example using gamm function
m1p<-gamm(yp~s(x0)+s(x1)+s(x2)+s(x3),family = quasipoisson,data=dat,random=list(g=~1))
m2p<-gamm(yp~s(x1)+s(x2)+s(x3),family = quasipoisson,data=dat,random=list(g=~1))
m3p<-gamm(yp~s(x0)+s(x2)+s(x3),family = quasipoisson,data=dat,random=list(g=~1))

#AIC to compare models
AIC(m1p,m2p,m3p)

#again the example with using gamm4 function will not work as it doesnt accept quassi falimies 
m<-gamm4(yp~s(x0)+s(x1)+s(x2)+s(x3),family = quasipoisson,data=dat,random = ~ (1|g))

Answer 1

这里有很多问题，但我会尝试解决它们。基本上，您希望使用

拟合参数统计模型

随机效果（nlme，lme4）
来自指数族的分布...（MASS::glmmPQL，lme4::glmer）
......过度离散...
...或指数系列之外的分布，例如Beta分布（VGAM，betareg）
添加剂模型/样条线（splines）...
...或惩罚回归样条，自动调整平滑术语的复杂性
...使用真实似然模型而不是边际或准可能性模型（例如GEE，PQL），因此您可以进行经典推理

上面的每个指定问题都会为模型拟合练习增加1个或更多“难度点”...通常一旦你的分数超过+3左右，你必须找到妥协或走捷径的方法你想要的一些东西。您已经正确地将gamm和gamm4标识为执行某些您想要的内容，但是您无法获得所有内容。一些建议：

偏大

处理过度离散的一种方法是使用观察级随机效应，例如

dat$obs <- factor(seq(nrow(dat)))
m <- gamm4(yp~s(x0)+s(x1)+s(x2)+s(x3),
           family = poisson,data=dat,random = ~ (1|g)+(1|obs))

另一种选择是自己调整过度离散，如果你认为有意义，例如：

m0 <- gamm4(yp~s(x0)+s(x1)+s(x2)+s(x3),family = poisson,data=dat,random = ~ (1|g))

首先计算过度离散：

(phi <- sum(residuals(m0$gam,type="pearson")^2/df.residual(m0$gam)))
## [1] 1.003436

（如果我们用m0$mer重复这个练习而不是0.9939696：结果几乎完全等于1，因为我们首先从Poisson分布生成数据......）

(qaic <- -2*logLik(m0$mer)/phi + 2*lme4:::npar.merMod(m0$mer))

N.B。我猜测以这种方式构建gamm4各个组件的可能性等是有意义的;使用风险自负

替代发行版

glmmADMB和glmmTMB个套餐（非CRAN但可通过Google找到......）都可以处理混合Beta模型。它们不能对惩罚回归样条进行惩罚，但您可以通过splines::ns()或splines::bs()使用常规样条线（但您必须确定适当的复杂程度 - 也许您可以从初步{{ 1}}或gamm适合......）

mgcv

library(glmmADMB) library(splines) m3b <- glmmadmb(yb~ns(x0,2)+ns(x1,2)+ns(x2,5)+ns(x3,2)+(1|g), family="beta",link="logit",data=dat)软件包原则上可以执行此操作：

glmmTMB

但是包正在开发中并且当前的结果集没有意义 - 所以我可能会在此时犹豫不决。

使用gamm4对beta和quassi家族进行模型选择

1 个答案:

偏大

替代发行版