ggplot 2:每列生成一个单独的箱形图

时间:2016-09-18 19:31:05

标签: r ggplot2

我试图生成许多单独的图,每个图绘制每个单元格(=行)的单个基因(=列)的水平。在代码I中,还有两个" cell"的子集,基于每个单元格对于gene1是否具有> 0的值(这是用dplyr处理的)。

我在下面尝试在一个单独的pdf图中绘制所有基因/列的值,一次。关于如何改变我的代码以生成每个基因/列的一个图的任何建议?

数据集:

          gene1      gene2      gene3      gene4      gene5
cell_1   0.0000   0.279204  25.995400  46.171700  94.234100
cell_2   0.0000  23.456000  77.339800 194.241000 301.234000
cell_3   2.0000  13.100000  45.309200   0.776565   0.000000
cell_4   0.0000  10.500000 107.508000   3.032500   0.000000
cell_5   3.0000   0.000000   0.266139   0.762981 123.371000

代码:

library(ggplot2)
library(dplyr)
library(tidyr)

#Loop making many single box plots

  df3 <- df2 %>% as.data.frame %>% mutate(Cell= rownames(.), positive = df2$gene1>0) %>% 
    gather(., key= gene, value="value", -Cell,-positive) %>% 
    mutate( absolute= abs(value), logabs= log(absolute+1))

  for (i in unique(df3$gene)) {
    geneplot <- df3 %>% ggplot(., aes(x=gene, y=logabs, fill=positive)) +
          geom_boxplot() +
          xlab("Gene") + ylab("Expression level (TPM log)") + 
          theme_classic(base_size = 14, base_family = "Helvetica") +
          theme(axis.text.y=element_text(size=14)) + 
          theme(axis.title.y=element_text(size=14, face="bold")) + 
          theme(axis.text.x=element_text(size=14)) +
          theme(axis.title.x=element_text(size=14, face="bold")) + 
          scale_fill_brewer(palette="Pastel1")
   print(geneplot)
   ggsave(sprintf("%s.png", df3$gene))

   dev.off()

  }

1 个答案:

答案 0 :(得分:2)

gene1<-c(0.0000, 0.0000, 2.0000, 0.0000, 3.0000)
gene2<-c(0.279204, 23.456000, 13.100000 , 10.500000, 3.0000)
gene3<-c(25.995400, 77.339800, 45.309200, 107.508000, 0.266139)
gene4<-c(46.171700, 194.241000, 0.776565, 3.032500, 0.762981)
gene5<-c(94.234100, 301.234000, 0.000000, 0.000000, 3.0000)
df<-data.frame(gene1, gene2, gene3,gene4,gene5)

df <- df %>% 
    as.data.frame %>% 
    mutate(Cell= rownames(.), positive = df$gene1>0) %>% 
    gather(., key= gene, value="value", -Cell,-positive) %>% 
    mutate( absolute= abs(value), logabs= log(absolute+1))

ggplot(data= df , aes(x=gene, y=logabs, fill=positive))+
    geom_boxplot()+facet_wrap(~ gene)

enter image description here

<强>更新

我不确定海报在问什么,但这里有几个解释:

实际数据具有值为1的其他基因,因此,使用facet_wrap(~ gene)会创建一个额外的不必要的图,如下所示:

gene1<-c(0.0000, 0.0000, 2.0000, 0.0000, 3.0000)
gene2<-c(0.279204, 23.456000, 13.100000 , 10.500000, 3.0000)
gene3<-c(25.995400, 77.339800, 45.309200, 107.508000, 0.266139)
gene4<-c(46.171700, 194.241000, 0.776565, 3.032500, 0.762981)
gene5<-c(94.234100, 301.234000, 0.000000, 0.000000, 3.0000)
gene6<-c(0.0000, 0.0000, 0.0000, 0.0000, 0.0000)
df<-data.frame(gene1, gene2, gene3,gene4,gene5, gene6)

df <- df %>% 
  as.data.frame %>% 
  mutate(Cell= rownames(.), positive = df$gene1>0) %>% 
  gather(., key= gene, value="value", -Cell,-positive) %>% 
  mutate( absolute= abs(value), logabs= log(absolute+1))

ggplot(data= df , aes(x=gene, y=logabs, fill=positive))+
  geom_boxplot()+facet_wrap(~ gene) 

enter image description here

为避免这种情况,只需运行

即可
df<-filter(df, value>0)

ggplot(data= df , aes(x=gene, y=logabs, fill=positive))+
  geom_boxplot()+facet_wrap(~ gene) 

获得:

enter image description here

如果这不是您的担忧我的道歉。也许是因为你想要摆脱没有价值的个别基因的突破,就像下面的@Huub Hoofs指出的那样。为了实现这一目标,正如Huub Hoofs所建议的那样,尝试以下方法:

ggplot(data= df , aes(x=gene, y=logabs, fill=positive))+
    geom_boxplot()+facet_grid(~ gene, scales = "free")

enter image description here

OR

ggplot(data= df , aes(x=gene, y=logabs, fill=positive))+
    geom_boxplot(aes(1))+facet_wrap(~ gene)

enter image description here