对于上下文::我正在查看多个不同的相关系数。对于每个相关,我创建了一个自举分布,并使用了引导百分位数方法来创建每个系数的置信区间。当我查看多个相关性时,实际上我使用的是更严格的alpha级别,以后我将需要对具有不同alpha校正的不同数据集重复此分析。所有这些都进展顺利,但我正在努力创建一个图表以将自定义间隔表示为误差线。
问题::如何在ggplot中创建图形来表示数据的中值以及误差条的自定义百分位数。我的数据在data.frame中,其中一个变量标识组(分析),第二个变量标识组中的所有分数。实际上,“分析”变量的每个级别有10,000个案例,总共40,000行。为了简洁起见,我在下面提供了索引打印输出。
>BootDistOverall[c(1:2,10000:10002,20000:20002,30000:30002),]
Analysis Dist
1 Alpha by Consequences (No Outlier) -0.4286326
2 Alpha by Consequences (No Outlier) -0.4191646
10000 Alpha by Consequences (No Outlier) -0.5248891
10001 Alpha by Past-30-Day Binge Drinking -0.2972018
10002 Alpha by Past-30-Day Binge Drinking -0.3011621
20000 Alpha by Past-30-Day Binge Drinking -0.4145920
20001 Q0 by Consequences 0.3689336
20002 Q0 by Consequences 0.4540535
30000 Q0 by Consequences 0.5772917
30001 Q0 by Past-30-Day Binge Drinking 0.6655952
30002 Q0 by Past-30-Day Binge Drinking 0.4412748
我已经能够使用ggplot创建数据的小提琴图(请参阅下面的链接和代码),但是我真的很想用每个分布的中值以及百分位数作为误差线来表示。我可以获取中位数或箱线图来表示此数据,但是我需要自定义百分位数。
p0 <- <-ggplot(BootDistOverall, aes(Analysis,Dist))+
geom_violin(scale = "area",
color = "#002344",
size = 1,
fill = "#FECB00")+
ylim(-1,1)+
geom_hline(yintercept = 0,
linetype = "dashed",
color = "black")+
xlab("Analysis")+
ylab("Bootstrapped Pearson's r")+
coord_flip()+
theme_bw()
我需要帮助来创建类似的图形,但要有对应于自定义百分位数的中位数点和误差线。我尝试了多种不同的方法(geom_errorbar,geom_pointrange),但似乎无法使它们起作用。我能够做的唯一方法是像我在使用R()的基础R图形中所做的那样,分别向图中添加线段(请参见下面的代码和链接),但是必须有更好的方法。是ggplot的新手,所以可能有一个简单的修复方法,但我不知所措。
#Create percentile points
Uppers = c(
quantile(BootDist2$Dist, .995,na.rm=T),
quantile(BootDist4$Dist, .995,na.rm=T),
quantile(BootDist1$Dist, .995,na.rm=T),
quantile(BootDist3$Dist, .995,na.rm=T))
Lowers = c(
quantile(BootDist2$Dist, .005,na.rm=T),
quantile(BootDist4$Dist, .005,na.rm=T),
quantile(BootDist1$Dist, .005,na.rm=T),
quantile(BootDist3$Dist, .005,na.rm=T))
#Create a point graph
ggplot(BootDistOverall, aes(x=Analysis,y=Dist))+
stat_summary(fun.y = mean,
geom = "point",
shape=22,
size=5,
color = "#002344",
fill = "#FECB00")+
theme_bw()+
coord_flip()+
ylim(-1,1)+
geom_hline(yintercept = 0,
linetype = "dashed",
color = "black")+
xlab("Analysis")+
ylab("Bootstrapped Pearson's r")+
#Add error bars with geomsemgents
geom_segment(x=1,xend=1,y=Lowers[1],yend=Uppers[1])+
geom_segment(x=2,xend=2,y=Lowers[2],yend=Uppers[2])+
geom_segment(x=3,xend=3,y=Lowers[3],yend=Uppers[3])+
geom_segment(x=4,xend=4,y=Lowers[4],yend=Uppers[4])+
geom_segment(x=.9,xend=1.1,y=Lowers[1],yend=Lowers[1])+
geom_segment(x=.9,xend=1.1,y=Uppers[1],yend=Uppers[1])+
geom_segment(x=1.9,xend=2.1,y=Lowers[2],yend=Lowers[2])+
geom_segment(x=1.9,xend=2.1,y=Uppers[2],yend=Uppers[2])+
geom_segment(x=2.9,xend=3.1,y=Lowers[3],yend=Lowers[3])+
geom_segment(x=2.9,xend=3.1,y=Uppers[3],yend=Uppers[3])+
geom_segment(x=3.9,xend=4.1,y=Lowers[4],yend=Lowers[4])+
geom_segment(x=3.9,xend=4.1,y=Uppers[4],yend=Uppers[4])
答案 0 :(得分:1)
在这里有一点点的信念,假设每个分析组的MAX值是您要绘制的,作为误差线的上限,MIN值是误差线的下限,并且剩下的应该是中位数。注意-您仅为Q0 by Past-30-Day Binge Drinking
提供了两行,因此这可能是一个错误的假设...您需要根据匹配项进行修改,无论数据实际代表什么...
...关于如何设置数据以绘制在ggplot()
中的方法-工作范例是,每种审美观念只有一个变量。为了绘制误差线,您需要x
,y
,ymin
和ymax
。重新格式化数据以匹配此格式后,绘图就很简单了。这是一个工作示例:
library(data.table)
library(ggplot2)
d <- structure(list(Analysis = c("Alpha by Consequences (No Outlier)",
"Alpha by Consequences (No Outlier)", "Alpha by Consequences (No Outlier)",
"Alpha by Past-30-Day Binge Drinking", "Alpha by Past-30-Day Binge Drinking",
"Alpha by Past-30-Day Binge Drinking", "Q0 by Consequences",
"Q0 by Consequences", "Q0 by Consequences", "Q0 by Past-30-Day Binge Drinking",
"Q0 by Past-30-Day Binge Drinking"), Dist = c(-0.4286326, -0.4191646,
-0.5248891, -0.2972018, -0.3011621, -0.414592, 0.3689336, 0.4540535,
0.5772917, 0.6655952, 0.4412748), var = c("median", "upper",
"lower", "upper", "median", "lower", "lower", "median", "upper",
"upper", "lower")), row.names = c(NA, -11L), class = c("data.table",
"data.frame"))
#impute which row is the min, max, and median - NOTE you only gave two rows for the last Analysis group
d[, var := ifelse(Dist == min(Dist), "lower", ifelse(Dist == max(Dist), "upper", "median")), by = Analysis]
#cast into one row per Analysis
d_wide <- dcast(Analysis ~ var, data = d, value.var = "Dist")
#plot
ggplot(d_wide, aes(Analysis, median, ymin = lower, ymax = upper)) +
geom_errorbar(width = .4) +
geom_point(colour = "orange", size = 4) +
coord_flip() +
theme_bw()
#> Warning: Removed 1 rows containing missing values (geom_point).
由reprex package(v0.2.1)于2019-03-09创建