Question

使用方法“ fREML”和“ REML”使用bam来拟合相同的模型给了我接近的结果，但是所解释的偏差与summary.gam返回的结果有很大不同。

使用“ fREML”时，数量约为〜3.5％（不好），而使用“ REML”时，数量约为50％（还不错）。怎么可能呢？哪个是正确的？

不幸的是，我无法提供一个简单的可复制示例。

#######################################
## method = "fREML", discrete = TRUE ##
#######################################

Family: binomial 
Link function: logit 
Formula:
ObsOrRand ~ s(Var1, k = 3) + s(RandomVar, bs = "re")  
Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|) 
(Intercept)  -5.0026     0.2199  -22.75   <2e-16  
Approximate significance of smooth terms:
                  edf Ref.df Chi.sq  p-value 
s(Var1)          1.00  1.001  17.54 2.82e-05 
s(RandomVar)     16.39 19.000 145.03  < 2e-16  
R-sq.(adj) =  0.00349   Deviance explained = 3.57%
fREML = 2.8927e+05  Scale est. = 1         n = 312515

########################################
## method = "fREML", discrete = FALSE ##
########################################

Family: binomial 
Link function: logit 
Formula:
ObsOrRand ~ s(Var1, k = 3) + s(RandomVar, bs = "re")  
Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|) 
(Intercept)  -4.8941     0.2207  -22.18   <2e-16  
Approximate significance of smooth terms:
                  edf Ref.df Chi.sq  p-value 
s(Var1)          1.008  1.016  17.44 3.09e-05 
s(RandomVar)     16.390 19.000 144.86  < 2e-16  
R-sq.(adj) =  0.00349   Deviance explained = 3.57%
fREML = 3.1556e+05  Scale est. = 1         n = 312515

#####################################################
## method = "REML", discrete method not applicable ##
#####################################################

Family: binomial 
Link function: logit 
Formula:
ObsOrRand ~ s(Var1, k = 3) + s(RandomVar, bs = "re")  
Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|) 
(Intercept)  -4.8928     0.2205  -22.19   <2e-16  
Approximate significance of smooth terms:
                  edf Ref.df Chi.sq  p-value 
s(Var1)          1.156  1.278  16.57 8.53e-05 
s(RandomVar)     16.379 19.000 142.60  < 2e-16  
R-sq.(adj) =  0.0035   Deviance explained = 50.8%
-REML = 3.1555e+05  Scale est. = 1         n = 312515

Answer 1

此问题可以追溯到mgcv_1.8-23。其 changlog 可供读取：

* bam extended family extension had introduced a bug in null deviance 
  computation for Gaussian additive case when using methods other than fREML 
  or GCV.Cp. Fixed.

现在事实证明，该修补程序对于高斯情况是成功的，但对于非高斯情况而言是不成功的。

首先让我提供一个可复制的示例，因为您的问题没有一个。

set.seed(0)
x <- runif(1000)
## the linear predictor is a 3rd degree polynomial
p <- binomial()$linkinv(0.5 + poly(x, 3) %*% rnorm(3) * 20)
## p is well spread out on (0, 1); check `hist(p)`
y <- rbinom(1000, 1, p)

library(mgcv)
#Loading required package: nlme
#This is mgcv 1.8-24. For overview type 'help("mgcv-package")'.

fREML <- bam(y ~ s(x, bs = 'cr', k = 8), family = binomial(), method = "fREML")
REML <- bam(y ~ s(x, bs = 'cr', k = 8), family = binomial(), method = "REML")
GCV <- bam(y ~ s(x, bs = 'cr', k = 8), family = binomial(), method = "GCV.Cp")

## explained.deviance = (null.deviance - deviance) / null.deviance
## so in this example we get negative explained deviance for "REML" method

unlist(REML[c("null.deviance", "deviance")])
#null.deviance      deviance 
#     181.7107     1107.5241 

unlist(fREML[c("null.deviance", "deviance")])
#null.deviance      deviance 
#     1357.936      1107.524 

unlist(GCV[c("null.deviance", "deviance")])
#null.deviance      deviance 
#     1357.936      1108.108

Null偏差不能小于偏差（TSS不能小于RSS），因此bam的“ REML”方法无法在此处返回正确的Null偏差。

我已在mgcv_1.8-24/R/bam.r的第1350行找到了问题所在：

object$family <- object$fitted.values <- NULL

实际上应该是

object$null.deviance <- object$fitted.values <- NULL

对于“ GCV.Cp”和“ fREML”以外的方法，在将大型bam模型矩阵简化为gam之后，n x p依靠p x p进行估算矩阵（n：数据数； p：系数数）。由于此新模型矩阵没有自然的解释，因此gam返回的许多数量应无效（除估计的平滑参数外）。西蒙放family是错字。

我构建了一个补丁版本，事实证明可以修复该错误。我会告诉西蒙在下一个版本中修复它。

mgcv_1.8-24：bam（）的“ fREML”或“ REML”方法给出了错误的解释偏差

1 个答案: