Question

我正在使用以下地理叠加模型

library(gamair)
library(mgcv)

data(mack)    
mack$log.net.area <- log(mack$net.area)

gm2 <- gam(egg.count ~ s(lon,lat,bs="gp",k=100,m=c(2,10,1)) +
                       s(I(b.depth^.5)) +
                       s(c.dist) +
                       s(temp.20m) +
                       offset(log.net.area),
                       data = mack, family = tw, method = "REML")

如何使用它来预测egg.count中没有协变量数据的新位置(lon/lat)上kriging的值？

例如说我想在这些新位置预测egg.count

    lon lat
1  -3.00  44
4  -2.75  44
7  -2.50  44
10 -2.25  44
13 -2.00  44
16 -1.75  44

但是在这里，我不知道协变量（b.depth，c.dist，temp.20m，log.net.area）的值。

Answer 1

predict仍然要求模型中使用的所有变量都在newdata中显示，但是您可以将一些0之类的任意值传递给您不需要的那些协变量拥有，然后使用type = "terms"和terms = name_of_the_wanted_smooth_term继续。使用

sapply(gm2$smooth, "[[", "label")
#[1] "s(lon,lat)"        "s(I(b.depth^0.5))" "s(c.dist)"        
#[4] "s(temp.20m)"

检查模型中的平滑项。

## new spatial locations to predict
newdat <- read.table(text = "lon lat
                             1  -3.00  44
                             4  -2.75  44
                             7  -2.50  44
                             10 -2.25  44
                             13 -2.00  44
                             16 -1.75  44")

## "garbage" values, just to pass the variable names checking in `predict.gam`
newdat[c("b.depth", "c.dist", "temp.20m", "log.net.area")] <- 0

## prediction on the link scale
pred_link <- predict(gm2, newdata = newdat, type = "terms", terms = "s(lon,lat)")
#   s(lon,lat)
#1  -1.9881967
#4  -1.9137971
#7  -1.6365945
#10 -1.1247837
#13 -0.7910023
#16 -0.7234683
#attr(,"constant")
#(Intercept) 
#   2.553535 

## simplify to vector
pred_link <- attr(pred_link, "constant") + rowSums(pred_link)
#[1] 0.5653381 0.6397377 0.9169403 1.4287511 1.7625325 1.8300665

## prediction on the response scale
pred_response <- gm2$family$linkinv(pred_link)
#[1] 1.760043 1.895983 2.501625 4.173484 5.827176 6.234301

如果我要为特定的平滑项进行预测，通常不使用predict.gam。 predict.gam的逻辑是首先对所有术语进行预测，即与您进行type = "terms"相同。然后

如果type = "link"，则对所有按词项的预测进行rowSums，加上截距（可能使用offset）；
如果未指定type = "terms"和"terms"或"exclude"，则按原样返回结果；
如果type = "terms"并且您已指定"terms"和/或"exclude"，则将进行一些后期处理以删除您不想要的字词，只给您想要的字词。 / li>

因此，predict.gam将始终对所有术语进行计算，即使您只想要一个术语。

知道这背后的效率低下，这就是我要做的事情

sm <- gm2$smooth[[1]]  ## extract smooth construction info for `s(lon,lat)`
Xp <- PredictMat(sm, newdat)  ## predictor matrix
b <- gm2$coefficients[with(sm, first.para:last.para)]  ## coefficients for this term
pred_link <- c(Xp %*% b) + gm2$coef[[1]]  ## this term + intercept
#[1] 0.5653381 0.6397377 0.9169403 1.4287511 1.7625325 1.8300665
pred_response <- gm2$family$linkinv(pred_link)
#[1] 1.760043 1.895983 2.501625 4.173484 5.827176 6.234301

您知道，我们得到的结果相同。

结果是否取决于分配给协变量的值（此处为0）？

将根据这些垃圾值进行一些垃圾预测，但是predict.gam最终将其丢弃。

谢谢，您是对的。我不太确定为什么要在新位置添加协变量值。

据我所知，对于像mgcv这样的大软件包，代码维护非常困难。如果您希望满足每个用户的需求，则需要对代码进行重大更改。显然，当像您这样的人只希望它预测一定的平滑度时，我在这里描述的predict.gam逻辑将效率很低。从理论上讲，如果是这种情况，则newdata中的变量名检查可以忽略用户不需要的那些术语。但是，这需要对predict.gam进行重大更改，并可能由于代码更改而引入许多错误。此外，您必须向CRAN提交变更日志，而CRAN可能不愿意看到这种急剧变化。

西蒙（Simon）曾经分享他的感受：有很多人告诉我，我应该这样写mgcv，但我简直不能。是的，对像他这样的软件包作者/维护者表示同情。

感谢更新答案。但是，我不明白为什么预测结果不依赖于新位置的协变量值。

这取决于您是否提供b.depth，c.dist，temp.20m，log.net.area的协变量值。但是由于您没有在新的位置使用它们，因此预测只是假设这些影响为0。

好，谢谢，我现在看到了！因此，可以说在新位置上没有协变量值的情况下，我只是预测残差的空间自相关的响应而正确吗？

您仅在预测空间场/平滑度。在GAM方法中，空间场被建模为均值的一部分，而不是方差-协方差（如在kriging中一样），因此我认为您在这里使用“残差”是不正确的。

是的，您是对的。只是为了了解这段代码的作用：是否正确地说我正在预测响应将如何在空间上变化，但是在新位置上它的实际值没有变化（因为我需要这些位置上的协变量的值）？

正确。您可以尝试使用predict.gam或不使用terms = "s(lon,lat)"来帮助您摘要输出。看看当您更改传递给其他协变量的垃圾值时，它如何变化。

## a possible set of garbage values for covariates
newdat[c("b.depth", "c.dist", "temp.20m", "log.net.area")] <- 0

predict(gm2, newdat, type = "terms")
#   s(lon,lat) s(I(b.depth^0.5)) s(c.dist) s(temp.20m)
#1  -1.9881967          -1.05514 0.4739174   -1.466549
#4  -1.9137971          -1.05514 0.4739174   -1.466549
#7  -1.6365945          -1.05514 0.4739174   -1.466549
#10 -1.1247837          -1.05514 0.4739174   -1.466549
#13 -0.7910023          -1.05514 0.4739174   -1.466549
#16 -0.7234683          -1.05514 0.4739174   -1.466549
#attr(,"constant")
#(Intercept) 
#   2.553535 

predict(gm2, newdat, type = "terms", terms = "s(lon,lat)")
#   s(lon,lat)
#1  -1.9881967
#4  -1.9137971
#7  -1.6365945
#10 -1.1247837
#13 -0.7910023
#16 -0.7234683
#attr(,"constant")
#(Intercept) 
#   2.553535

## another possible set of garbage values for covariates
newdat[c("b.depth", "c.dist", "temp.20m", "log.net.area")] <- 1
#   s(lon,lat) s(I(b.depth^0.5))  s(c.dist) s(temp.20m)
#1  -1.9881967        -0.9858522 -0.3749018   -1.269878
#4  -1.9137971        -0.9858522 -0.3749018   -1.269878
#7  -1.6365945        -0.9858522 -0.3749018   -1.269878
#10 -1.1247837        -0.9858522 -0.3749018   -1.269878
#13 -0.7910023        -0.9858522 -0.3749018   -1.269878
#16 -0.7234683        -0.9858522 -0.3749018   -1.269878
#attr(,"constant")
#(Intercept) 
#   2.553535 

predict(gm2, newdat, type = "terms", terms = "s(lon,lat)")
#   s(lon,lat)
#1  -1.9881967
#4  -1.9137971
#7  -1.6365945
#10 -1.1247837
#13 -0.7910023
#16 -0.7234683
#attr(,"constant")
#(Intercept) 
#   2.553535

具有“ gp”平滑功能的GAM：在新位置进行预测

1 个答案: