在ggplot中绘制具有伽马分布的模型

时间:2019-03-29 18:19:21

标签: r ggplot2 mixed-models gamma-function

我正在绘制物种中男性和女性的飞行速度和时间之间的关系。我的广义线性混合模型(针对个体ID的随机截距)表明,女性与男性之间存在差异,因此在图中,我想证明这些差异。

到目前为止,我有以下情节:

p <- ggplot() +
  geom_jitter(data = df, aes(time, pace), shape = 1) +
  scale_x_log10(breaks = c(1, 10, 100)) +
  scale_y_log10() +
  labs(x = "Time",
       y = "Flight speed (m/s)") + 
  theme_bw()

enter image description here

但是现在我想添加一些行来显示这种关系。我尝试了两种不同的方法:

1)按物种使用geom_smooth和facet

p + geom_smooth(data = df, aes(time, pace),
              method = "glm", method.args = list(family = "Gamma"),
              se = FALSE, 
              colour = "black", size = 0.8) +
              facet_wrap(~sex)

Warning message:
Computation failed in `stat_smooth()`:
non-positive values not allowed for the 'gamma' family 

2)从我的GLMM中获取斜率并截取值,然后使用abline

p + geom_abline(slope = 0.003, intercept = 0.202) + 
    geom_abline(slope = 0.003, intercept = 0.202-0.103)

enter image description here

这些似乎都没有我想要的。因此,我的问题是,如何以一种对我的模型有意义的方式显示男女的飞行速度关系?

供参考,我的模型是:

glmer(pace ~ time + sex + (1 | ID), 
      data = df, family = Gamma(link = "inverse")))

   Fixed effects:
                  Estimate Std. Error t value Pr(>|z|)    
    (Intercept)  0.2021276  0.0320861   6.300 2.99e-10 ***
    totDayH      0.0028364  0.0005808   4.883 1.04e-06 ***
    sexM        -0.1033563  0.0382595  -2.701   0.0069 ** 

我的数据是:

   pace <- c(7.81, 2.64, 11.65, 4.38, 7.3, 3.85, 4.02, 0.12, 0.73, 3.33, 
    2.29, 3.67, 7.21, 3.14, 1.98, 2.73, 3.07, 9.16, 4.86, 6.27, 6.55, 
    10.46, 1.16, 0.14, 0.86, 4.88, 10.78, 16.73, 6.68, 5.51, 1.88, 
    25.03, 6.78, 5.14, 6.76, 5.3, 8.79, 5.38, 2.01, 4.01, 0.57, 11.65, 
    6.87, 0.57, 1.94, 1.13, 4.73, 9.92, 0.67, 4.13, 4.49, 1.18, 0.84, 
    3.8, 2.12, 2.85, 3.54, 0.21, 0.69, 5.1, 4.49, 0.04, 0.78, 1.53, 
    1.75, 1.77, 4.05, 6.46, 0.31)

   time <- c(0.82, 60.18, 0.88, 36.03, 1.41, 2.41, 2.24, 222.69, 27.72, 
    47.32, 4.05, 45.97, 21.89, 5.49, 28.88, 27.86, 4.94, 0.72, 33.48, 
    8.84, 1.1, 0.72, 144.5, 461.82, 197.33, 2.09, 5.3, 12.29, 0.91, 
    1.24, 68.3, 6.35, 0.85, 2.37, 31.64, 15.14, 15.12, 39.64, 5.99, 
    44.75, 270.02, 17.62, 44.63, 45.03, 12.12, 243.16, 9.03, 7.45, 
    485.29, 78.65, 4.26, 665.22, 59.42, 207.99, 145.93, 6.44, 81.36, 
    34, 8.25, 1.51, 1.72, 142.18, 414.35, 244.14, 5.5, 8.47, 37.95, 
    2.83, 469.54)

    sex <- structure(c(2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
    2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
    1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("F", "M"), class = "factor")

    ID <- structure(c(3L, 5L, 5L, 9L, 9L, 9L, 14L, 19L, 24L, 24L, 24L, 
    27L, 28L, 28L, 28L, 28L, 28L, 31L, 31L, 31L, 31L, 31L, 32L, 34L, 
    37L, 37L, 37L, 38L, 38L, 38L, 38L, 39L, 46L, 46L, 49L, 51L, 51L, 
    60L, 62L, 62L, 62L, 66L, 94L, 96L, 96L, 96L, 96L, 96L, 97L, 99L, 
    102L, 102L, 102L, 102L, 104L, 105L, 107L, 109L, 109L, 109L, 109L, 
    109L, 112L, 112L, 113L, 113L, 113L, 113L, 113L), .Label = c("NB2014.12", 
    "NB2014.13", "NB2014.14", "NB2014.15", "NB2014.16", "NB2014.42", 
    "NB2014.43", "NB2014.44", "NB2014.45", "NB2014.47", "NB2014.48", 
    "NB2014.49", "NB2014.70", "NB2014.71", "NB2014.72", "NB2014.73", 
    "NB2014.74", "NB2014.75", "NB2014.76", "NB2014.77", "NB2014.78", 
    "NB2014.79", "NB2014.80", "NB2014.81", "NB2015.156", "NB2015.157", 
    "NB2015.158", "NB2015.159", "NB2015.160", "NB2015.312", "NB2015.313", 
    "NB2015.314", "NB2015.315", "NB2015.316", "NB2015.317", "NB2015.318", 
    "NB2015.320", "NB2015.321", "NB2015.322", "NB2015.323", "NB2015.324", 
    "NB2015.325", "NB2015.326", "NB2015.327", "NB2015.328", "NB2015.329", 
    "NB2015.330", "NB2015.331", "NB2015.332", "NB2015.333", "NB2015.334", 
    "NB2015.335", "NB2015.336", "NB2015.337", "NB2015.338", "NB2015.339", 
    "NB2015.340", "NB2015.341", "NB2015.342", "NB2015.343", "NB2015.344", 
    "NB2015.345", "NB2015.346", "NB2015.347", "NB2015.348", "NB2015.349", 
    "NB2015.350", "NB2015.351", "NB2018.10", "NB2018.11", "NB2018.12", 
    "NB2018.13", "NB2018.14", "NB2018.15", "NB2018.16", "NB2018.17", 
    "NB2018.18", "NB2018.19", "NB2018.20", "NB2018.21", "NB2018.22", 
    "NB2018.23", "NB2018.24", "NB2018.25", "NB2018.26", "NB2018.27", 
    "NB2018.28", "NB2018.29", "NB2018.30", "NB2018.31", "NB2018.32", 
    "NB2018.33", "NB2018.34", "NB2018.35", "NB2018.37", "NB2018.38", 
    "NB2018.39", "NB2018.40", "NB2018.41", "NB2018.42", "NB2018.43", 
    "NB2018.44", "NB2018.45", "NB2018.46", "NB2018.47", "NB2018.48", 
    "NB2018.49", "NB2018.5", "NB2018.50", "NB2018.51", "NB2018.52", 
    "NB2018.53", "NB2018.54", "NB2018.55", "NB2018.56", "NB2018.57", 
    "NB2018.58", "NB2018.59", "NB2018.6", "NB2018.60", "NB2018.61", 
    "NB2018.62", "NB2018.63", "NB2018.64", "NB2018.7", "NB2018.8", 
    "NB2018.9"), class = "factor")

2 个答案:

答案 0 :(得分:1)

我发现您可以为glm回归显示一条曲线,该曲线使用对数转换X轴而不是Y轴。

p <- ggplot(data = df, aes(time, pace), shape = 1) +
    geom_jitter()
p2 <- p + geom_smooth( aes(time, pace),
                method = "glm", method.args = list(family = "Gamma"),
                se = FALSE, 
                colour = "black", size = 0.8) +
         facet_wrap(~sex)
png(); print(p2+
                scale_x_log10(breaks = c( 10, 100))) ; dev.off()

enter image description here

(注意:如果要绘制覆盖值的预测结果,则应使用由predict.glm制成的新数据对象及其带有序列输入的newdata,并使用type="response"选项。您的线的斜率和截距错误的原因是,它的数据处于原始尺度时处于转换后的线性预测器尺度上。)

答案 1 :(得分:0)

主要问题是要拟合多元模型,因此在二维空间中绘制时,必须使用拟合值来表示模型预测。换句话说,您得到的系数是与每个变量关联的条件边际效应;它们不代表变量和结果之间的二维关系。这是一个使用拟合值的简单线性模型的示例:

library('data.table')
library('ggplot2')
df <- data.table(ID = ID, time = time, sex = sex, pace = pace)
modelFit <- glm(pace ~ time + sex + ID,
                data = df, family = Gamma(link = "inverse"))
df[, pPace:= predict(modelFit, df, 'response')]
setnames(df, c('pace', 'pPace'), c('Actual', 'Predicted'))
df <- melt(df,
           id.vars = c('ID', 'time', 'sex'),
           value.name = 'pace')

ggplot(data = df, aes(x = time, y = pace, color = variable)) +
  geom_jitter(shape = 1) +
  scale_x_log10(breaks = c(1, 10, 100)) +
  scale_y_log10() +
  labs(x = "Time",
       y = "Flight speed (m/s)") + 
  theme_bw()

enter image description here

您会看到预测并不是一条完美的线,因为它们是由其他变量(特定于ID的截距和性别)的变化部分确定的。