如何使用d_ply ggplot为每个因子添加中值线

时间:2015-03-19 12:25:31

标签: r ggplot2

我有一个数据帧,我想创建一个点图,比较每个变量的每个Cntrl($ Cntrl)类型的值。

这是数据帧的结构。

> str(Int_mltL)
'data.frame':   123144 obs. of  10 variables:
 $ File.name : Factor w/ 5864 levels "6-1_LB-1_P2-H-2_01_4587.mzXML",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ variable  : Factor w/ 21 levels "peak1","peak2",..: 6 1 7 2 8 3 9 4 10 5 ...
 $ value     : num  597 203 255 0 130 ...
 $ valueL    : num  6.39 5.31 5.54 -Inf 4.87 ...
 $ SmpType   : Factor w/ 32 levels "6","PA14_EM_1",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Cntrl     : chr  "LB" "LB" "LB" "LB" ...
 $ PA14_locus: Factor w/ 4434 levels "EMPTY","PA14_00020",..: 1691 1691 1691 1691 1691 1691 1691 1691 1691 1691 ...
 $ Locus     : Factor w/ 294 levels "","EMPTY","PA14_00680",..: NA NA NA NA NA NA NA NA NA NA ...
 $ GeneName  : Factor w/ 12 levels "","pchC","pchD",..: NA NA NA NA NA NA NA NA NA NA ...
 $ BMtype    : Factor w/ 4 levels "","pch","phz",..: NA NA NA NA NA NA NA NA NA NA ...

这是我的工作代码,用于生成所需的点图。

pdf(paste(out_dir,"dotplot.Cntrl.Int.pdf",sep=""))
d_ply(Int_mltL,"variable",.print = TRUE,
  function(x){
    ggplot(x,aes(x=Cntrl,y=valueL))+geom_point()+
      labs(title=paste("intensites for peak",x$variable,"across all files",sep=" "))
  })
dev.off()

这很好用。问题是我想添加一条水平线,表示每个图的中位数。我做了几次尝试,但都失败了。 这是最新的。

 pdf(paste(out_dir,"dotplot.Cntrl.Int.pdf",sep=""))
d_ply(Int_mltL,"variable",.print = TRUE,
  function(x){
    medvL<-median(x$valueL)
    ggplot(x,aes(x=Cntrl,y=valueL))+geom_point()+
      labs(title=paste("intensites for peak",x$variable,"across all files",sep=" "))+
      geom_hline(aes(yintercept=medvL,colour="red"))
  })
dev.off()

错误。

Error in eval(expr, envir, enclos) : object 'medvL' not found

我也尝试过。

任何帮助都表示赞赏。谢谢      geom_hline(AES(y截距=中间值(X $值1),颜色= “红色”))

我认为这与我如何使用d_ply有关。 以下作品。

  medAllPeaks<-median(Int_mltL$valueL)
  i=1
  x<-Int_mltL[Int_mltL$variable==Int_mltL$variable[i],]
  p<-ggplot(x,aes(x=Cntrl,y=valueL))+geom_point(alpha=I(0.7), position=position_jitter(width=0.1, height=0))+
      labs(title=paste("intensites for peak",x$variable,"across all files",sep=" "))+
      geom_hline(aes(yintercept=median(x$valueL), color="red"))+ # median for this variable
      geom_hline(aes(yintercept=medAllPeaks,linetype="dashed" )) # median across all
   print(p)
}

但是,这不是。它绘制了所有变量的所有图的第一个变量的中值的中间线。

medAllPeaks<-median(Int_mltL$valueL)
pdf(paste(out_dir,"dotplot.Cntrl.Int.pdf",sep=""))
d_ply(Int_mltL,.(variable),.print = TRUE,
    function(x){
    ggplot(x,aes(x=Cntrl,y=valueL))+geom_point(alpha=I(0.7), position=position_jitter(width=0.1, height=0))+
    labs(title=paste("intensites for peak",x$variable,"across all files",sep=" "))+
    geom_hline(aes(yintercept=median(x$valueL), color="red"))+
    geom_hline(aes(yintercept=medAllPeaks,linetype="dashed" ))
    #print(p)
    }
  )
dev.off()

示例数据(输入太大,因为它列出了如此多的因子级别,例如.File.name:因子w / 5864级别):

    > rbind(head(Int_mltL),tail(Int_mltL))
                              File.name variable     value   valueL   SmpType Cntrl PA14_locus
1             6-1_LB-1_P2-H-2_01_4587.mzXML    peak6  596.9730 6.391872         6    LB PA14_28410
2             6-1_LB-1_P2-H-2_01_4587.mzXML    peak1  202.7060 5.311757         6    LB PA14_28410
3             6-1_LB-1_P2-H-2_01_4587.mzXML    peak7  255.1080 5.541687         6    LB PA14_28410
4             6-1_LB-1_P2-H-2_01_4587.mzXML    peak2    0.0000     -Inf         6    LB PA14_28410
5             6-1_LB-1_P2-H-2_01_4587.mzXML    peak8  130.4480 4.870975         6    LB PA14_28410
6             6-1_LB-1_P2-H-2_01_4587.mzXML    peak3   45.3949 3.815400         6    LB PA14_28410
123139 PA14_EM_9-4_H-9_P1-H-9_01_7107.mzXML   peak16 1687.8400 7.431205 PA14_EM_9    NA       <NA>
123140 PA14_EM_9-4_H-9_P1-H-9_01_7107.mzXML   peak17  566.5060 6.339488 PA14_EM_9    NA       <NA>
123141 PA14_EM_9-4_H-9_P1-H-9_01_7107.mzXML   peak18  343.4430 5.839021 PA14_EM_9    NA       <NA>
123142 PA14_EM_9-4_H-9_P1-H-9_01_7107.mzXML   peak19   44.9409 3.805348 PA14_EM_9    NA       <NA>
123143 PA14_EM_9-4_H-9_P1-H-9_01_7107.mzXML   peak20  198.4650 5.290613 PA14_EM_9    NA       <NA>
123144 PA14_EM_9-4_H-9_P1-H-9_01_7107.mzXML   peak21 4019.0500 8.298801 PA14_EM_9    NA       <NA>
   Locus GeneName BMtype
1       <NA>     <NA>   <NA>
2       <NA>     <NA>   <NA>
3       <NA>     <NA>   <NA>
4       <NA>     <NA>   <NA>
5       <NA>     <NA>   <NA>
6       <NA>     <NA>   <NA>
123139  <NA>     <NA>   <NA>
123140  <NA>     <NA>   <NA>
123141  <NA>     <NA>   <NA>
123142  <NA>     <NA>   <NA>
123143  <NA>     <NA>   <NA>
123144  <NA>     <NA>   <NA>

1 个答案:

答案 0 :(得分:0)

没有可重现的数据,我不确定这是否有效,但这里的代码可以提供帮助。

首先计算中位数:

f <- as.formula(paste(Cntrl, "~", valueL))
medians <- aggregate(f, data=x, median, na.rm=T)

然后在你的函数中添加你的ggplot调用,根据你的愿望修改颜色,标签和大小:

+ geom_text(data=medians, color="red", label="|", size=8)