R中的手动交互图线性回归

时间:2020-10-14 17:54:54

标签: r plot linear-regression interaction

我正在尝试使用对数转换后的丰度数据(更好的拟合度)和其他一些变量来预测在不同月相下看到的动物的平均丰度(因子)。最好的模型(最低的AIC)证明包括相位和调查持续时间与云层的相互作用(都是连续的):

LMoon<-lm(ln~Phase*Duration+Clouds, data=abund)

summary(LMoon)

Call:
lm(formula = ln ~ Phase * Duration + Clouds, data = abund)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.75416 -0.46311  0.09522  0.46591  1.85978 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)       0.382031   0.876865   0.436 0.664125    
Phase2            2.130065   1.226305   1.737 0.085851 .  
Phase3            1.971060   1.818542   1.084 0.281351    
Phase4            0.608043   1.140122   0.533 0.595146    
Phase5            4.786674   1.151850   4.156 7.44e-05 ***
Phase6            0.958706   1.046831   0.916 0.362238    
Phase7            0.254711   3.425214   0.074 0.940888    
Phase8            0.865995   1.043916   0.830 0.409005    
Duration          0.069153   0.035407   1.953 0.053952 .  
Clouds           -0.004259   0.002401  -1.774 0.079494 .  
Phase2:Duration  -0.087843   0.047818  -1.837 0.069545 .  
Phase3:Duration  -0.089908   0.069652  -1.291 0.200109    
Phase4:Duration  -0.005424   0.046675  -0.116 0.907749    
Phase5:Duration  -0.172016   0.049369  -3.484 0.000768 ***
Phase6:Duration  -0.035597   0.041435  -0.859 0.392583    
Phase7:Duration   0.024084   0.176773   0.136 0.891939    
Phase8:Duration  -0.033424   0.042064  -0.795 0.428963    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7721 on 89 degrees of freedom
Multiple R-squared:  0.3368,    Adjusted R-squared:  0.2176 
F-statistic: 2.825 on 16 and 89 DF,  p-value: 0.0009894

现在,由于这种相互作用,我需要绘制一个相互作用图(绘制lsmeans时CI太宽)。 我尝试使用这里提到的不同功能,但是没有一个起作用。 显然,我需要手动计算和绘图,就像这样:

intercepts <- c(coef(LMoon)["(Intercept)"],
                coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase2"],
                coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase3"],
                coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase4"],
                coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase5"],
                coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase6"],
                coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase7"],
                coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase8"])

lines.df <- data.frame(intercepts = intercepts,
                       slopes = c(coef(LMoon)["Duration"],
                                  coef(LMoon)["Duration"]+coef(LMoon)["Phase2:Duration"],
                                  coef(LMoon)["Duration"]+coef(LMoon)["Phase3:Duration"],
                                  coef(LMoon)["Duration"]+coef(LMoon)["Phase4:Duration"],
                                  coef(LMoon)["Duration"]+coef(LMoon)["Phase5:Duration"],
                                  coef(LMoon)["Duration"]+coef(LMoon)["Phase6:Duration"],
                                  coef(LMoon)["Duration"]+coef(LMoon)["Phase7:Duration"],
                                  coef(LMoon)["Duration"]+coef(LMoon)["Phase8:Duration"]),
                       Phase2 = levels(abund$Phase))

qplot(x = Duration, y = Sp2, color = Phase, data = abund) + 
  geom_abline(aes(intercept = intercepts, 
                  slope = slopes, 
                  color = Phase), data = lines.df)

我得到的图是错误的,因为y值是在原始的真实比例上,但是这些线是基于使用对数转换数据的lm。

interaction plot abundance, duration, lunar phases

要对此进行反向转换,有人告诉我,实际上我最终不会得到直线。 而不是使用abline(),我应该创建一组例如100个新的x值覆盖了持续时间数据的范围,并使用系数来计算您的预测y值。然后使用lines()绘制这些图形,它看起来应该像一条平滑的曲线。

这就是我迷路的地方。

因此,我为调查持续时间的范围(最小15到最大45分钟)创建了一组新的x值: dur2 <- seq(from = 15, to = 45, length.out=100)

然后,一旦我获得了这些值,就应该使用LM的系数来获得每个x值的预测y值。之后,将y值反转换为原始比例。然后使用x值和反向转换的y值将线添加到绘图中。

我现在如何准确地获得预测值?我不能使用任何pred类型/函数,我已经尝试了全部。只是不适用于我的模型,所以手动是唯一的方法,但是我不知道如何...

希望任何人都可以帮助我,到目前为止,我已经尝试了好几周,但绝望了,快要放弃了。

干杯!

PS 这里的数据:

> dput(subset(abund, Phase %in% c("Phase1", "Phase2")))

structure(list(Year = integer(0), Date = structure(integer(0), .Label = c("01/08/2009", 
"01/08/2016", "02/07/2019", "02/08/2009", "02/08/2012", "02/08/2016", 
"02/09/2007", "03/08/2007", "03/08/2009", "03/08/2014", "03/08/2015", 
"04/07/2019", "04/08/2009", "04/08/2013", "05/08/2009", "05/08/2014", 
"05/08/2015", "06/07/2008", "06/07/2019", "07/08/2009", "08/07/2010", 
"09/07/2010", "09/08/2015", "10/08/2009", "11/08/2009", "12/08/2009", 
"13/08/2009", "13/08/2014", "14/08/2009", "14/08/2012", "16/07/2006", 
"18/07/2009", "18/08/2015", "19/07/2011", "20/08/2009", "21/07/2011", 
"21/09/2009", "22/07/2011", "22/07/2016", "22/07/2017", "23/07/2007", 
"23/07/2016", "23/07/2017", "24/07/2017", "25/07/2007", "25/07/2010", 
"25/07/2017", "25/08/2016", "26/07/2010", "26/07/2011", "27/07/2006", 
"27/07/2011", "27/07/2012", "28/07/2016", "29/06/2019", "29/07/2005", 
"29/07/2009", "29/07/2010", "29/07/2016", "29/07/2019", "30/07/2005", 
"30/07/2007", "30/07/2016", "30/08/2005", "31/07/2005", "31/07/2009", 
"31/07/2014", "31/07/2016"), class = "factor"), NrSurvey = integer(0), 
    Duration = integer(0), Sp2 = integer(0), Phase = structure(integer(0), .Label = c("1", 
    "2", "3", "4", "5", "6", "7", "8"), class = "factor"), Clouds = integer(0), 
    Visibility = integer(0), ln = numeric(0)), row.names = integer(0), class = "data.frame")

1 个答案:

答案 0 :(得分:0)

使用predict获取预测值。不要手动计算。

使用expand.grid()来生成dur2序列和其他预测变量的所有组合的数据帧,并以您要绘制的值为单位。像这样:

prediction_data = expand.grid(
  Duration = dur2,
  Phase= unique(abund$Phase),
  Clouds = mean(abund$Clouds) # Hold Clouds constant at some value
)

# column names in prediction_data need to match those in the model formula
prediction_data$pred = predict(LMoon, newdata = prediction_data)
prediction_data$pred_orig = exp(prediction_data$pred)

# plot
ggplot(prediction_data, aes(x = Duration, y = pred_orig, color = Phase)) +
  geom_line() +
  geom_point(data = abund)

类似的东西应该起作用。

另一个不错的选择是使用broom::augment生成预测。这样也可以轻松给出每个预测点的标准误差和残差。

library(broom)
prediction_data = augment(LMoon, newdata = prediction_data)