我有3个预测变量:年份(2010-2019),管理(Tabu / Open)和捕鱼方法(矛,钓线,拖钓或净钓)。我已将GAM设置为三个响应变量:工作量(小时),捕获量(重量,kg)和效率(单位工作量捕获量,CPUE)。对于所有这三个模型,管理部门都将Year设置为Gam,并且Management和Method也包含为固定效果(此组合的AIC最低,与我们的问题最相关)。这三个响应变量均存在偏差,需要进行LogX + 1转换。转换之后,权重和CPUE仍然偏斜,尽管可以接受残差直方图和残差与拟合值的关系图,也可以进行过度分散测试。
数据有很多可变性,因此模型只能解释5-10%的偏差。
使用visreg,我已经绘制了覆盖原始数据的模型。但是鉴于它的传播,这很难解释。相反,我随后使用geom_pointrange在模型上绘制了均值/ SE。
对于“小时”图,模型非常适合每种年/管理类型的均值和误差。
对于权重图,该模型预测的平均值远低于每年的平均值/ SE,并且Management =“ Tabu open”的斜率不正确。
对于CPUE,模型形状大致正确,但远高于均值/ SE点范围。
鉴于它可以工作数小时,我猜想问题在于重量和CPUE的数据偏斜,即使模型验证是可以接受的。
为什么重量和CPUE的模型与均值/ SE数据之间的重叠很少,而小时数却没有?
还有一种更好的方法来绘制模型,而不是用1.原始数据或2.均值覆盖吗?
代码如下:
summary(reeffish)
Year Management Method Hours Weight CPUE
Min. :2010 Always open:398 Line :417 Min. : 0.500 Min. : 0.000 Min. : 0.0000
1st Qu.:2011 Tabu open :312 Net : 49 1st Qu.: 3.000 1st Qu.: 1.196 1st Qu.: 0.2735
Median :2013 Spear :217 Median : 4.420 Median : 2.200 Median : 0.5092
Mean :2013 Trolling: 27 Mean : 5.689 Mean : 4.108 Mean : 0.8665
3rd Qu.:2014 3rd Qu.: 6.567 3rd Qu.: 4.234 3rd Qu.: 0.9472
Max. :2019 Max. :67.330 Max. :152.000 Max. :14.7800
#Fit model
Hours.gam1 = gam(log(Hours+1) ~ s(Year, k=5,by=Management) + Management+Method, data= reeffish)
Weight.gam1 = gam(log(Weight+1) ~ s(Year, k=5,by=Management) + Management+Method, data= reeffish)
CPUE.gam1 = gam(log(CPUE+1) ~ s(Year, k=5,by=Management) + Management+Method,data= reeffish)
#Model Validation
#Hours
E3 <- resid(Hours.gam1); F3 <- fitted(Hours.gam1) # get the residuals and fitted values
par(mfrow=c(1,2),mar=c(5,4,3,3)) # this is setting up a multi-panel plot window
hist(E3, breaks=25,main="",xlab="Residuals")
plot(x=F3, y=E3, xlab="Fitted values", ylab="Residuals")
abline(0,0)
#Weight
E3 <- resid(Weight.gam1); F3 <- fitted(Weight.gam1) # get the residuals and fitted values
par(mfrow=c(1,2),mar=c(5,4,3,3)) # this is setting up a multi-panel plot window
hist(E3, breaks=25,main="",xlab="Residuals")
plot(x=F3, y=E3, xlab="Fitted values", ylab="Residuals")
abline(0,0)
#CPUE
E3 <- resid(CPUE.gam1); F3 <- fitted(CPUE.gam1) # get the residuals and fitted values
par(mfrow=c(1,2),mar=c(5,4,3,3)) # this is setting up a multi-panel plot window
hist(E3, breaks=25,main="",xlab="Residuals")
plot(x=F3, y=E3, xlab="Fitted values", ylab="Residuals")
abline(0,0)
Residual plots of Hours, Weight and CPUE
#Model Output
summary(Hours.gam1)
summary(Weight.gam1)
summary(CPUE.gam1)
Family: gaussian
Link function: identity
Formula:
log(Hours + 1) ~ s(Year, k = 5, by = Management) + Management +
Method
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.637127 0.030477 53.717 < 2e-16 ***
ManagementTabu open 0.123398 0.051120 2.414 0.016040 *
MethodNet 0.284026 0.080653 3.522 0.000457 ***
MethodSpear 0.001778 0.047798 0.037 0.970341
MethodTrolling -0.224008 0.103047 -2.174 0.030054 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(Year):ManagementAlways open 1.000 1.000 2.660 0.1033
s(Year):ManagementTabu open 1.908 2.262 3.463 0.0219 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.043 Deviance explained = 5.24%
GCV = 0.27033 Scale est. = 0.2673 n = 704
Family: gaussian
Link function: identity
Formula:
log(Weight + 1) ~ s(Year, k = 5, by = Management) + Management +
Method
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.14805 0.03944 29.107 < 2e-16 ***
ManagementTabu open -0.06480 0.06473 -1.001 0.317
MethodNet 0.57534 0.10475 5.493 5.56e-08 ***
MethodSpear 0.31332 0.06173 5.076 4.96e-07 ***
MethodTrolling -0.09969 0.13294 -0.750 0.454
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(Year):ManagementAlways open 1.742 2.021 1.809 0.161
s(Year):ManagementTabu open 1.770 2.118 0.758 0.406
R-sq.(adj) = 0.0714 Deviance explained = 8.14%
GCV = 0.44714 Scale est. = 0.44173 n = 704
Family: gaussian
Link function: identity
Formula:
log(CPUE + 1) ~ s(Year, k = 5, by = Management) + Management +
Method
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.46682 0.02180 21.418 < 2e-16 ***
ManagementTabu open -0.09232 0.03157 -2.924 0.00357 **
MethodNet 0.25574 0.05789 4.418 1.16e-05 ***
MethodSpear 0.18187 0.03409 5.335 1.29e-07 ***
MethodTrolling 0.04806 0.07345 0.654 0.51306
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(Year):ManagementAlways open 1.82 2.092 1.813 0.154
s(Year):ManagementTabu open 1.00 1.000 0.922 0.337
R-sq.(adj) = 0.0725 Deviance explained = 8.15%
GCV = 0.13622 Scale est. = 0.1347 n = 704
#Plot model
Hours.visreg=visreg(Hours.gam1,"Year", by="Management" , partial=FALSE, overlay=TRUE, trans=exp)
Weight.visreg=visreg(Weight.gam1,"Year", by="Management" , partial=FALSE, overlay=TRUE, trans=exp)
CPUE.visreg=visreg(CPUE.gam1,"Year", by="Management" , partial=FALSE, overlay=TRUE, trans=exp)
visreg(Hours.gam1,"Year", by="Management" , partial=TRUE, overlay=TRUE, trans=exp)
visreg(Weight.gam1,"Year", by="Management" , partial=TRUE, overlay=TRUE, trans=exp)
visreg(CPUE.gam1,"Year", by="Management" , partial=TRUE, overlay=TRUE, trans=exp)
#Plot model with Means overlaid, calculated outside of R
Means.alldata<- read.csv('reeffishMeans.csv', header= T, strip.white= T)
Means.alldata=Means.alldata%>% mutate(Year=as.integer(Year))
#Hours figure
Hours.reeffish.gam= ggplot(data=Means.alldata, aes(x=Year, y=Hours, fill=factor(Management), color=factor(Management)))+
geom_ribbon(data= Hours.visreg$fit, aes(x=Year, ymin=visregLwr, ymax=visregUpr), alpha=0.5, linetype=1, size=0.2, colour=NA) +
geom_line(data=Hours.visreg$fit, aes(x=Year,y=visregFit, group=Management, color=Management), lwd=1) +
geom_pointrange(aes(ymin= Hours-HoursSE, ymax= Hours+HoursSE), width=.5, size=0.5, position=position_dodge(.05))+
scale_colour_manual(values=c('deepskyblue4', 'coral2'))+
scale_fill_manual(values=c('deepskyblue3', 'coral1'))+
scale_y_continuous(name=expression('Fishing Effort (Hours)'), limits=c(2,8), breaks=c(2,4,6,8)) +
scale_x_continuous(name=expression('Year'), limits=c(2009,2020), breaks=c(2010,2012,2014,2016,2018,2020))+
theme_classic()+
theme(axis.title.y= element_text(size=14), axis.text.y= element_text(size=12), axis.text.x= element_text(size=12), axis.title.x= element_text(size=14), legend.text=element_text(size=12), legend.title=element_text(size=14))
#Weight figure
Weight.reeffish.gam= ggplot(data=Means.alldata, aes(x=Year, y=Weight, fill=factor(Management), color=factor(Management)))+
geom_ribbon(data= Weight.visreg$fit, aes(x=Year, ymin=visregLwr, ymax=visregUpr), alpha=0.5, linetype=1, size=0.2, colour=NA) +
geom_line(data=Weight.visreg$fit, aes(x=Year,y=visregFit, group=Management, color=Management), lwd=1) +
geom_pointrange(aes(ymin= Weight-WeightSE, ymax= Weight+WeightSE), width=.5, size=0.5, position=position_dodge(.05))+
scale_colour_manual(values=c('deepskyblue4', 'coral2'))+
scale_fill_manual(values=c('deepskyblue3', 'coral1'))+
scale_y_continuous(name=expression('Catch (Kg)'), limits=c(0,8), breaks=c(2,4,6,8)) +
scale_x_continuous(name=expression('Year'), limits=c(2009,2020), breaks=c(2010,2012,2014,2016,2018,2020))+
theme_classic()+
theme(axis.title.y= element_text(size=14), axis.text.y= element_text(size=12), axis.text.x= element_text(size=12), axis.title.x= element_text(size=14), legend.text=element_text(size=12), legend.title=element_text(size=14))
#CPUE figure
CPUE.reeffish.gam= ggplot(data=Means.alldata, aes(x=Year, y=CPUE, fill=factor(Management), color=factor(Management)))+
geom_ribbon(data= CPUE.visreg$fit, aes(x=Year, ymin=visregLwr, ymax=visregUpr), alpha=0.5, linetype=1, size=0.2, colour=NA) +
geom_line(data=CPUE.visreg$fit, aes(x=Year,y=visregFit, group=Management, color=Management), lwd=1) +
geom_pointrange(aes(ymin= CPUE-CPUESE, ymax= CPUE+CPUESE), width=.5, size=0.5, position=position_dodge(.05))+
scale_colour_manual(values=c('deepskyblue4', 'coral2'))+
scale_fill_manual(values=c('deepskyblue3', 'coral1'))+
scale_y_continuous(name=expression('CPUE'), limits=c(0,2), breaks=c(0,.5,1,1.5,2)) +
scale_x_continuous(name=expression('Year'), limits=c(2009,2020), breaks=c(2010,2012,2014,2016,2018,2020))+
theme_classic()+
theme(axis.title.y= element_text(size=14), axis.text.y= element_text(size=12), axis.text.x= element_text(size=12), axis.title.x= element_text(size=14), legend.text=element_text(size=12), legend.title=element_text(size=14))
Hours.reeffish.gam
Weight.reeffish.gam
CPUE.reeffish.gam
答案 0 :(得分:1)
此后,我发现了问题所在。由于对数据进行了log(x + 1)转换,但仅通过trans = exp进行了反向转换,因此模型预测全部归一。对于CPUE来说,这只是明显的不同,因为它们的单位要小得多。