Question

您好我在R / ggplot2中相对较新，我想就如何创建如下图表提出一些建议：

说明：一个分歧条形图显示生物功能，其基因表达增加（黄色）指向右侧，以及表达减少的基因（紫色）指向左侧。条的长度表示差异表达的基因的数量，并且颜色强度根据它们的p值而变化。

请注意，x轴在两个方向上都必须为“正”。（在已发表的关于基因表达实验研究的文献中，指向左侧的条形代表具有降低的表达的基因，并且右侧显示具有增加的表达的基因。该图的目的不是显示变化的“大小”（其中）会产生正负值。相反，我们试图绘制具有表达变化的基因数，因此不能为负数）

我已尝试过ggplot2但未能完全重现显示的图表。以下是我试图绘制的数据：Click here for link

> dput(sample)
structure(list(Name = structure(c(15L, 19L, 5L, 11L, 8L, 6L, 
16L, 13L, 17L, 1L, 3L, 2L, 14L, 18L, 7L, 12L, 10L, 9L, 4L, 20L
), .Label = c("Actin synthesis", "Adaptive immunity", "Antigen presentation", 
"Autophagy", "Cell cycle", "Cell division", "Cell polarity", 
"DNA repair", "Eye development", "Lipid metabolism", "Phosphorylation", 
"Protein metabolism", "Protein translation", "Proteolysis", "Replication", 
"Signaling", "Sumoylation", "Trafficking", "Transcription", "Translational initiation"
), class = "factor"), Trend_in_AE = structure(c(2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("Down", "Up"), class = "factor"), Count = c(171L, 
201L, 38L, 63L, 63L, 47L, 22L, 33L, 20L, 16L, 16L, 7L, 10L, 4L, 
13L, 15L, 5L, 7L, 9L, 7L), PValue = c(1.38e-08, 1.22e-06, 1.79e-06, 
2.89e-06, 0.000122, 0.000123, 0.00036, 0.000682, 0.001030253, 
0.001623939, 7.76e-05, 0.000149, 0.000734, 0.001307039, 0.00292414, 
0.003347556, 0.00360096, 0.004006781, 0.007330264, 0.010083734
)), .Names = c("Name", "Trend_in_AE", "Count", "PValue"), class = "data.frame", row.names = c(NA, 
-20L))

非常感谢您的帮助和建议，这对我的学习非常有帮助。

我自己谦虚的尝试是这样的：

table <- read.delim("file.txt", header = T, sep = "\t")
library(ggplot2)
ggplot(aes(x=Number, y=Names)) + 
  geom_bar(stat="identity",position="identity") + 
  xlab("number of genes") + 
  ylab("Name"))

结果是关于aes的错误消息

Answer 1

虽然不完全符合您的要求，但以下内容应该让您入门。 @Genoa，正如表达的那样，“没有免费的午餐”。因此，正如@dww正确指出的那样，表现出“一些努力”！

# create dummy data
df <- data.frame(x = letters,y = runif(26))
# compute normalized occurence for letter
df$normalize_occurence <- round((df$y - mean(df$y))/sd(df$y), 2)  
# categorise the occurence
df$category<- ifelse(df$normalize_occurence >0, "high","low")
# check summary statistic
summary(df)
       x            y           normalize_occurence 
a      : 1   Min.   :0.00394   Min.   :-1.8000000  
b      : 1   1st Qu.:0.31010   1st Qu.:-0.6900000  
c      : 1   Median :0.47881   Median :-0.0800000  
d      : 1   Mean   :0.50126   Mean   : 0.0007692  
e      : 1   3rd Qu.:0.70286   3rd Qu.: 0.7325000  
f      : 1   Max.   :0.93091   Max.   : 1.5600000  
(Other):20                                         
category        
Length:26         
Class :character  
Mode  :character 

ggplot(df,aes(x = x,y = normalize_occurence)) + 
      geom_bar(aes(fill = category),stat = "identity") +
      labs(title= "Diverging Bars")+
      coord_flip()

Answer 2

@ddw和@Ashish是对的 - 这个问题有很多。还不清楚ggplot如何“失败”再现这个数字，这将有助于理解你正在努力解决的问题。

ggplot的关键在于，您想要包含在绘图中的所有内容都应该包含在数据中。向表中添加一些变量以帮助将条形图放在正确的方向上将使您获得所需的大量信息。使实际为负（“向下”值）的变量为负，并且它们将以这种方式绘制：

r_sample$Count2 <- ifelse(r_sample$Trend_in_AE=="Down",r_sample$Count*-1,r_sample$Count)
r_sample$PValue2 <- ifelse(r_sample$Trend_in_AE=="Down",r_sample$PValue*-1,r_sample$PValue)

然后重新排序“名称”，以便根据新的PValue2变量进行绘图：

r_sample$Name <- factor(r_sample$Name, r_sample$Name[order(r_sample$PValue2)], ordered=T)

最后，您需要左对齐某些标签并对其他标签进行右对齐，因此现在将其变为变量：

r_sample$just <- ifelse(r_sample$Trend_in_AE=="Down",0,1)

然后一些相当小的情节代码让你非常接近你想要的东西：

ggplot(r_sample, aes(x=Name, y=Count2, fill=PValue2)) +
  geom_bar(stat="identity") +
  scale_y_continuous("Number of Differently Regulated Genes", position="top", limits=c(-100,225), labels=c(100,0,100,200)) +
  scale_x_discrete("", labels=NULL) +
  scale_fill_gradient2(low="blue", mid="light grey", high="yellow", midpoint=0) +
  coord_flip() +
  theme_minimal() +
  geom_text(aes(x=Name, y=0, label=Name), hjust=r_sample$just)

您可以浏览theme commands on the ggplot2 help page以找出其余的格式。

如何在R中分开发散条形图

2 个答案: