geom_smooth()与facet_grid()与mgcv ::: gam与by =“”

时间:2018-10-27 20:46:43

标签: r ggplot2 gam mgcv

我正在比较使用facet_grid通过ggplot2的geom_smooth执行的gam与使用mgcv ::: gam进行gam的输出(通过visreg可视化指定的“ by”因子)进行比较。随附数据和代码:

library(dplyr)
set.seed(1)
dat <- iris %>% mutate(response = sample(rep(c(0,1),length.out=150/2),150, replace=T))

#Just the output from geom_smooth
library(ggplot2)
ggplot(dat, aes(Sepal.Length,response)) +
  geom_point() +
  geom_smooth(method="gam", formula = y~s(x, bs="cs"), method.args=list("binomial")) +
  facet_grid(.~Species)
#Now performing the gam through mgcv:::gam specifying by=Species
library(mgcv)
gam <- gam(dat, formula = response~s(Sepal.Length, bs="cs", by=Species),family = binomial())
#Comparing the two different outputs
library(visreg)
visreg(gam, "Sepal.Length", by="Species", scale="response", gg=T)  +
  guides(color=F)+
  geom_smooth(data=dat,aes(Sepal.Length,response),
              method="gam", formula = y~s(x, bs="cs"), method.args=list("binomial"), color="red", fill="green")

geom_smooth output shown with red smooth and green CI, mgcv with blue smooth and grey CI

基本上,我认为发生的事情是,mgcv ::: gam的gam平滑基于某些类型的“估算”数据,而每个物种级别实际上都没有。看来geom_smooth()中的设置可以避免这种情况。有谁知道如何解决这个问题,以便geom_smooth和mgcv ::: gam的输出是相同的?

编辑:

根据user20650的回答,代码已更新为:

library(mgcv)
gam <- gam(dat, formula = response~Species + s(Sepal.Length, bs="cs", by=Species),family = binomial()) 
library(visreg)
visreg(gam, "Sepal.Length", by="Species", scale="response", gg=T)  +
  guides(color=F)+
  geom_smooth(data=dat,aes(Sepal.Length,response),
              method="gam", formula = y~s(x, bs="cs"), method.args=list("binomial"), color="red", fill="green",
              fullrange=T)

updated plot

从上图中可以看出,两种方法之间存在细微的差异(大部分在CI中)。如果我们看看例如set.seed(100):

library(dplyr)
set.seed(100)
dat <- iris %>% mutate(response = sample(rep(c(0,1),length.out=150/2),150, replace=T))

library(mgcv)
gam <- gam(dat, formula = response~Species + s(Sepal.Length, bs="cs", by=Species),family = binomial()) 
library(visreg)
visreg(gam, "Sepal.Length", by="Species", scale="response", gg=T)  +
  guides(color=F)+
  geom_smooth(data=dat,aes(Sepal.Length,response),
              method="gam", formula = y~s(x, bs="cs"), method.args=list("binomial"), color="red", fill="green",
              fullrange=T) 

new seed plot

谁能解释这两种方法的区别,以及如何从mgcv ::: gam中的geom_smooth()生成相同的输出,反之亦然(在fullrange = T和fullrange = F的情况下)? / p>

0 个答案:

没有答案