这是一个后续行动question。当我运行下面给出的代码时,我得到一个带有两个R2和p值但是p值= 0的图。这可能是由于p值非常小。我试着增加不。数字到20(这里是signif(..p.value.., digits = 4)
),但它不起作用。我宁愿提及精确的p值或使用星号,例如if (p<0.002) star='**' else if (p>=0.002&p<0.05) star='*' else star=''
。另外,我想在图中列出r值。看看,请告诉我哪个部分需要修改。期待!
更新
@ eipi10的答案代码用于添加p值工作,但在ggplots中添加相关系数(r)时仍然寻求答案。
代码:
library(dplyr)
library(ggplot2)
library(ggpmisc)
df <- diamonds %>%
dplyr::filter(cut%in%c("Fair","Ideal")) %>%
dplyr::filter(clarity%in%c("I1" , "SI2" , "SI1" , "VS2" , "VS1", "VVS2")) %>%
dplyr::mutate(new_price = ifelse(cut == "Fair",
price* 0.5,
price * 1.1))
formula <- y ~ x - 1
p <- ggplot(df, aes(x,y, color=factor(cut)))
p <- p + stat_smooth(method = "lm", formula = y ~ x-1, size = 1, level=0.95)
p <- p + geom_point(alpha = 0.3)
p <- p + stat_poly_eq(aes(label = paste(..rr.label..)),
label.x.npc = "right", label.y.npc = 0.15, formula = formula,
parse = TRUE, size = 3) +
stat_fit_glance(method = 'lm', method.args = list(formula = formula),
geom = 'text', aes(label = paste("P-value = ",
signif(..p.value.., digits = 4), sep = "")),label.x.npc = 'right',
label.y.npc = 0.35, size = 3)
print(p)
答案 0 :(得分:4)
这是一个大型数据集,你可以从图中看到拟合几乎是完美的,这意味着回归的p值将是微不足道的。以下是cut
两个级别中每个级别的回归模型。为节省空间,仅显示模型摘要的关键部分:
lapply(unique(df$cut), function(g) {
summary(lm(y ~ x - 1, df %>% filter(cut==g)))
})
cut=="Ideal" ... Coefficients: Estimate Std. Error t value Pr(>|t|) x 1.001715 0.000269 3724 <2e-16 *** ... Residual standard error: 0.2079 on 18291 degrees of freedom Multiple R-squared: 0.9987, Adjusted R-squared: 0.9987 F-statistic: 1.387e+07 on 1 and 18291 DF, p-value: < 2.2e-16 cut=="Fair" ... Coefficients: Estimate Std. Error t value Pr(>|t|) x 0.9895032 0.0004096 2416 <2e-16 *** ... Residual standard error: 0.1033 on 1583 degrees of freedom Multiple R-squared: 0.9997, Adjusted R-squared: 0.9997 F-statistic: 5.836e+06 on 1 and 1583 DF, p-value: < 2.2e-16
请注意巨大的F统计数据。这种大F统计量的p值基本上为零。
pf(5.836e06, 1, 1583, lower=FALSE)
[1] 0
任何超过2,400的F统计量(对于给定的自由度)将给出低于R可以表示的最小非零数的p值。
pf(2400, 1, 1583, lower=FALSE)
[1] 1.716433e-319
默认情况下,当R舍入一个数字时,它不会返回尾随零(尝试round(1.340000, 5)
或signif(1.340000,5)
)。要打印更多零,您可以使用sprintf
格式化输出字符串。下面的代码用科学计数法格式化p值。对于十进制数,请将"%1.4e"
替换为"%1.4f"
。有关格式字符串的更多详细信息,请参阅帮助:
p <- ggplot(df, aes(x,y, color=cut)) +
stat_smooth(method = "lm", formula = y ~ x-1, size = 1, level=0.95) +
geom_point(alpha = 0.3) +
stat_poly_eq(aes(label = paste(..rr.label..)),
label.x.npc = "right", label.y.npc = 0.15, formula = formula,
parse=TRUE, size = 3) +
stat_fit_glance(method = 'lm', method.args = list(formula = formula),
geom='text', aes(label=paste0("P-value = ", sprintf("%1.4e", ..p.value..))),
label.x.npc = 'right',
label.y.npc = 0.4, size = 3)
更新:要添加已加星标的p值范围,一个选项是使用带有p值范围的ifelse
语句作为条件:
p <- ggplot(df, aes(x,y, color=cut)) +
stat_smooth(method = "lm", formula = y ~ x-1, size = 1, level=0.95) +
geom_point(alpha = 0.3) +
stat_poly_eq(aes(label = paste(..rr.label..)),
label.x.npc = "right", label.y.npc = 0.15, formula = formula,
parse=TRUE, size = 3) +
stat_fit_glance(method = 'lm', method.args = list(formula = formula),
geom='text', aes(label=ifelse(..p.value..< 0.001, "p<0.001**",
ifelse(..p.value..>=0.001 & ..p.value..<0.05, "p<0.05*", "p>0.05"))),
label.x.npc = 'right',
label.y.npc = 0.4, size = 3)