我有一个数据帧,我想创建一个点图,比较每个变量的每个Cntrl($ Cntrl)类型的值。
这是数据帧的结构。
> str(Int_mltL)
'data.frame': 123144 obs. of 10 variables:
$ File.name : Factor w/ 5864 levels "6-1_LB-1_P2-H-2_01_4587.mzXML",..: 1 1 1 1 1 1 1 1 1 1 ...
$ variable : Factor w/ 21 levels "peak1","peak2",..: 6 1 7 2 8 3 9 4 10 5 ...
$ value : num 597 203 255 0 130 ...
$ valueL : num 6.39 5.31 5.54 -Inf 4.87 ...
$ SmpType : Factor w/ 32 levels "6","PA14_EM_1",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Cntrl : chr "LB" "LB" "LB" "LB" ...
$ PA14_locus: Factor w/ 4434 levels "EMPTY","PA14_00020",..: 1691 1691 1691 1691 1691 1691 1691 1691 1691 1691 ...
$ Locus : Factor w/ 294 levels "","EMPTY","PA14_00680",..: NA NA NA NA NA NA NA NA NA NA ...
$ GeneName : Factor w/ 12 levels "","pchC","pchD",..: NA NA NA NA NA NA NA NA NA NA ...
$ BMtype : Factor w/ 4 levels "","pch","phz",..: NA NA NA NA NA NA NA NA NA NA ...
这是我的工作代码,用于生成所需的点图。
pdf(paste(out_dir,"dotplot.Cntrl.Int.pdf",sep=""))
d_ply(Int_mltL,"variable",.print = TRUE,
function(x){
ggplot(x,aes(x=Cntrl,y=valueL))+geom_point()+
labs(title=paste("intensites for peak",x$variable,"across all files",sep=" "))
})
dev.off()
这很好用。问题是我想添加一条水平线,表示每个图的中位数。我做了几次尝试,但都失败了。 这是最新的。
pdf(paste(out_dir,"dotplot.Cntrl.Int.pdf",sep=""))
d_ply(Int_mltL,"variable",.print = TRUE,
function(x){
medvL<-median(x$valueL)
ggplot(x,aes(x=Cntrl,y=valueL))+geom_point()+
labs(title=paste("intensites for peak",x$variable,"across all files",sep=" "))+
geom_hline(aes(yintercept=medvL,colour="red"))
})
dev.off()
错误。
Error in eval(expr, envir, enclos) : object 'medvL' not found
我也尝试过。
任何帮助都表示赞赏。谢谢 geom_hline(AES(y截距=中间值(X $值1),颜色= “红色”))
我认为这与我如何使用d_ply有关。 以下作品。
medAllPeaks<-median(Int_mltL$valueL)
i=1
x<-Int_mltL[Int_mltL$variable==Int_mltL$variable[i],]
p<-ggplot(x,aes(x=Cntrl,y=valueL))+geom_point(alpha=I(0.7), position=position_jitter(width=0.1, height=0))+
labs(title=paste("intensites for peak",x$variable,"across all files",sep=" "))+
geom_hline(aes(yintercept=median(x$valueL), color="red"))+ # median for this variable
geom_hline(aes(yintercept=medAllPeaks,linetype="dashed" )) # median across all
print(p)
}
但是,这不是。它绘制了所有变量的所有图的第一个变量的中值的中间线。
medAllPeaks<-median(Int_mltL$valueL)
pdf(paste(out_dir,"dotplot.Cntrl.Int.pdf",sep=""))
d_ply(Int_mltL,.(variable),.print = TRUE,
function(x){
ggplot(x,aes(x=Cntrl,y=valueL))+geom_point(alpha=I(0.7), position=position_jitter(width=0.1, height=0))+
labs(title=paste("intensites for peak",x$variable,"across all files",sep=" "))+
geom_hline(aes(yintercept=median(x$valueL), color="red"))+
geom_hline(aes(yintercept=medAllPeaks,linetype="dashed" ))
#print(p)
}
)
dev.off()
示例数据(输入太大,因为它列出了如此多的因子级别,例如.File.name:因子w / 5864级别):
> rbind(head(Int_mltL),tail(Int_mltL))
File.name variable value valueL SmpType Cntrl PA14_locus
1 6-1_LB-1_P2-H-2_01_4587.mzXML peak6 596.9730 6.391872 6 LB PA14_28410
2 6-1_LB-1_P2-H-2_01_4587.mzXML peak1 202.7060 5.311757 6 LB PA14_28410
3 6-1_LB-1_P2-H-2_01_4587.mzXML peak7 255.1080 5.541687 6 LB PA14_28410
4 6-1_LB-1_P2-H-2_01_4587.mzXML peak2 0.0000 -Inf 6 LB PA14_28410
5 6-1_LB-1_P2-H-2_01_4587.mzXML peak8 130.4480 4.870975 6 LB PA14_28410
6 6-1_LB-1_P2-H-2_01_4587.mzXML peak3 45.3949 3.815400 6 LB PA14_28410
123139 PA14_EM_9-4_H-9_P1-H-9_01_7107.mzXML peak16 1687.8400 7.431205 PA14_EM_9 NA <NA>
123140 PA14_EM_9-4_H-9_P1-H-9_01_7107.mzXML peak17 566.5060 6.339488 PA14_EM_9 NA <NA>
123141 PA14_EM_9-4_H-9_P1-H-9_01_7107.mzXML peak18 343.4430 5.839021 PA14_EM_9 NA <NA>
123142 PA14_EM_9-4_H-9_P1-H-9_01_7107.mzXML peak19 44.9409 3.805348 PA14_EM_9 NA <NA>
123143 PA14_EM_9-4_H-9_P1-H-9_01_7107.mzXML peak20 198.4650 5.290613 PA14_EM_9 NA <NA>
123144 PA14_EM_9-4_H-9_P1-H-9_01_7107.mzXML peak21 4019.0500 8.298801 PA14_EM_9 NA <NA>
Locus GeneName BMtype
1 <NA> <NA> <NA>
2 <NA> <NA> <NA>
3 <NA> <NA> <NA>
4 <NA> <NA> <NA>
5 <NA> <NA> <NA>
6 <NA> <NA> <NA>
123139 <NA> <NA> <NA>
123140 <NA> <NA> <NA>
123141 <NA> <NA> <NA>
123142 <NA> <NA> <NA>
123143 <NA> <NA> <NA>
123144 <NA> <NA> <NA>
答案 0 :(得分:0)
没有可重现的数据,我不确定这是否有效,但这里的代码可以提供帮助。
首先计算中位数:
f <- as.formula(paste(Cntrl, "~", valueL))
medians <- aggregate(f, data=x, median, na.rm=T)
然后在你的函数中添加你的ggplot调用,根据你的愿望修改颜色,标签和大小:
+ geom_text(data=medians, color="red", label="|", size=8)