在ggplot上添加p值和r [后续行动]

时间:2016-09-05 17:02:52

标签: r ggplot2

这是一个后续行动question。当我运行下面给出的代码时,我得到一个带有两个R2和p值但是p值= 0的图。这可能是由于p值非常小。我试着增加不。数字到20(这里是signif(..p.value.., digits = 4)),但它不起作用。我宁愿提及精确的p值或使用星号,例如if (p<0.002) star='**' else if (p>=0.002&p<0.05) star='*' else star=''。另外,我想在图中列出r值。看看,请告诉我哪个部分需要修改。期待!

更新

@ eipi10的答案代码用于添加p值工作,但在ggplots中添加相关系数(r)时仍然寻求答案

代码:

library(dplyr) 
library(ggplot2)
library(ggpmisc)

df <- diamonds %>%
  dplyr::filter(cut%in%c("Fair","Ideal")) %>%
  dplyr::filter(clarity%in%c("I1" ,  "SI2" , "SI1" , "VS2" , "VS1",  "VVS2")) %>%
  dplyr::mutate(new_price = ifelse(cut == "Fair", 
                                   price* 0.5, 
                                   price * 1.1))

formula <- y ~ x - 1

p <- ggplot(df, aes(x,y, color=factor(cut))) 
p <- p + stat_smooth(method = "lm", formula = y ~ x-1, size = 1, level=0.95) 
p <- p + geom_point(alpha = 0.3) 
p <- p + stat_poly_eq(aes(label = paste(..rr.label..)),
                      label.x.npc = "right", label.y.npc = 0.15, formula = formula, 
                      parse = TRUE, size = 3) + 
          stat_fit_glance(method = 'lm', method.args = list(formula = formula),
                      geom = 'text', aes(label = paste("P-value = ", 
                      signif(..p.value.., digits = 4), sep = "")),label.x.npc = 'right',
                      label.y.npc = 0.35, size = 3)
print(p)

enter image description here

1 个答案:

答案 0 :(得分:4)

这是一个大型数据集,你可以从图中看到拟合几乎是完美的,这意味着回归的p值将是微不足道的。以下是cut两个级别中每个级别的回归模型。为节省空间,仅显示模型摘要的关键部分:

lapply(unique(df$cut), function(g) {
  summary(lm(y ~ x - 1, df %>% filter(cut==g)))
})
cut=="Ideal"
...
Coefficients:
  Estimate Std. Error t value Pr(>|t|)    
x 1.001715   0.000269    3724   <2e-16 ***
...
Residual standard error: 0.2079 on 18291 degrees of freedom
Multiple R-squared:  0.9987,  Adjusted R-squared:  0.9987 
F-statistic: 1.387e+07 on 1 and 18291 DF,  p-value: < 2.2e-16

cut=="Fair"
...
Coefficients:
   Estimate Std. Error t value Pr(>|t|)    
x 0.9895032  0.0004096    2416   <2e-16 ***
...
Residual standard error: 0.1033 on 1583 degrees of freedom
Multiple R-squared:  0.9997,  Adjusted R-squared:  0.9997 
F-statistic: 5.836e+06 on 1 and 1583 DF,  p-value: < 2.2e-16

请注意巨大的F统计数据。这种大F统计量的p值基本上为零。

pf(5.836e06, 1, 1583, lower=FALSE)  
[1] 0

任何超过2,400的F统计量(对于给定的自由度)将给出低于R可以表示的最小非零数的p值。

pf(2400, 1, 1583, lower=FALSE)
[1] 1.716433e-319

默认情况下,当R舍入一个数字时,它不会返回尾随零(尝试round(1.340000, 5)signif(1.340000,5))。要打印更多零,您可以使用sprintf格式化输出字符串。下面的代码用科学计数法格式化p值。对于十进制数,请将"%1.4e"替换为"%1.4f"。有关格式字符串的更多详细信息,请参阅帮助:

p <- ggplot(df, aes(x,y, color=cut)) + 
  stat_smooth(method = "lm", formula = y ~ x-1, size = 1, level=0.95) + 
  geom_point(alpha = 0.3) +
  stat_poly_eq(aes(label = paste(..rr.label..)),
               label.x.npc = "right", label.y.npc = 0.15, formula = formula, 
               parse=TRUE, size = 3) + 
  stat_fit_glance(method = 'lm', method.args = list(formula = formula),
                  geom='text', aes(label=paste0("P-value = ", sprintf("%1.4e", ..p.value..))),
                  label.x.npc = 'right',
                  label.y.npc = 0.4, size = 3)

enter image description here

更新:要添加已加星标的p值范围,一个选项是使用带有p值范围的ifelse语句作为条件:

p <- ggplot(df, aes(x,y, color=cut)) + 
  stat_smooth(method = "lm", formula = y ~ x-1, size = 1, level=0.95) + 
  geom_point(alpha = 0.3) +
  stat_poly_eq(aes(label = paste(..rr.label..)),
               label.x.npc = "right", label.y.npc = 0.15, formula = formula, 
               parse=TRUE, size = 3) + 
  stat_fit_glance(method = 'lm', method.args = list(formula = formula),
                  geom='text', aes(label=ifelse(..p.value..< 0.001, "p<0.001**", 
                                                ifelse(..p.value..>=0.001 & ..p.value..<0.05, "p<0.05*", "p>0.05"))),
                  label.x.npc = 'right',
                  label.y.npc = 0.4, size = 3)

enter image description here