绘制带有多个标签的条形图?

时间:2019-12-25 08:42:18

标签: r bar-chart

我正在尝试使用R绘制条形图,它看起来应该像这样:

enter image description here

我的数据:

GO_term Category    Number  Percentage  Function
GO:0005623  Cellular Component  6   1.9 cell
GO:0043226  Cellular Component  5   1.6 organelle
GO:0044464  Cellular Component  6   1.9 cell part
GO:0044422  Cellular Component  2   0.6 organelle part
GO:0032991  Cellular Component  3   1   protein-containing complex
GO:0016020  Cellular Component  4   1.3 membrane
GO:0005576  Cellular Component  20  6.4 extracellular region
GO:0044425  Cellular Component  1   0.3 membrane part
GO:0005488  Molecular Function  104 33.2    binding
GO:0003824  Molecular Function  266 85  catalytic activity
GO:0005198  Molecular Function  3   1   structural molecule activity
GO:0045735  Molecular Function  3   1   nutrient reservoir activity
GO:0016209  Molecular Function  12  3.8 antioxidant activity
GO:0008152  Biological Process  189 60.4    metabolic process
GO:0009987  Biological Process  25  8   cellular process
GO:0051179  Biological Process  6   1.9 localization
GO:0050896  Biological Process  10  3.2 response to stimulus
GO:0051704  Biological Process  1   0.3 multi-organism process
GO:0071840  Biological Process  4   1.3 cellular component organization or biogenesis

我尝试使用R:

> obs <- read.table("try.csv", sep = ",", header = T)

> barplot(obs$percentage, main = "gene ontology" , ylab = "Percentage",
    names.arg = c("cell", "organelle", "cell part ", "organelle part",
    "protein-containing complex", "membrane", "extracellular region",
    "membrane part", "binding", " catalytic activity", "structural molecule activity",
    "nutrient reservoir activity", "antioxidant activity", "metabolic process",
    "cellular process", "localization", "response to stimulus", " multi-organism process", 
    "cellular component organization or biogenesis"), 
    col = "darkred", las = 2)

这给了我

enter image description here

我尝试使用:p旋转轴

> barplot(obs$percentage, col = "grey50", main = "gene ontology", ylab = "Number", 
    ylim = c(0,5+max(obs$number)), xlab = "try", names.arg = c("cell", "organelle",
    "cell part ", "organelle part", "protein-containing complex", "membrane",
    "extracellular region", "membrane part", "binding", " catalytic activity",
    "structural molecule activity", "nutrient reservoir activity", "antioxidant activity",
    "metabolic process", "cellular process", "localization", "response to stimulus",
    " multi-organism process", "cellular component organization or biogenesis"),
    theme(axis.text.x = element_text(angle = 45, size = rel (1.5))))

但是它给了我错误:

  

width / 2错误:二进制运算符In的非数字参数   另外:警告消息:在mean.default(width)中:参数不是   数字或逻辑:返回NA

然后,我尝试使用以下方法切割y轴,以使数据看起来更干净:

> gap.barplot(obs$percentage, main = "gene ontology", ylab = "Percentage",
    ylim = c(0,5+max(obs$number)), xlab = "try", names.arg = c("cell", "organelle",
    "cell part ", "organelle part", "protein-containing complex", "membrane",
    "extracellular region", "membrane part", "binding", " catalytic activity",
    "structural molecule activity", "nutrient reservoir activity", "antioxidant activity",
    "metabolic process", "cellular process", "localization", "response to stimulus",
    " multi-organism process", "cellular component organization or biogenesis"),
    las = 2,  col = "darkred", gap=c(10, 30), ytics = c(0, 10, 20, 30, 40, 80, 90, 100))

输出:

enter image description here

>dput(obs)输出:

structure(list(GO_term = structure(c(5L, 11L, 14L, 12L, 10L, 
8L, 4L, 13L, 3L, 1L, 2L, 15L, 9L, 6L, 7L, 17L, 16L, 18L, 19L), .Label = c("GO:0003824", 
"GO:0005198", "GO:0005488", "GO:0005576", "GO:0005623", "GO:0008152", 
"GO:0009987", "GO:0016020", "GO:0016209", "GO:0032991", "GO:0043226", 
"GO:0044422", "GO:0044425", "GO:0044464", "GO:0045735", "GO:0050896", 
"GO:0051179", "GO:0051704", "GO:0071840"), class = "factor"), 
    Category = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 
    3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Biological Process", 
    "Cellular Component", "Molecular Function"), class = "factor"), 
    Number = c(6L, 5L, 6L, 2L, 3L, 4L, 20L, 1L, 104L, 266L, 3L, 
    3L, 12L, 189L, 25L, 6L, 10L, 1L, 4L), Percentage = c(1.9, 
    1.6, 1.9, 0.6, 1, 1.3, 6.4, 0.3, 33.2, 85, 1, 1, 3.8, 60.4, 
    8, 1.9, 3.2, 0.3, 1.3), Function = structure(1:19, .Label = c("cell", 
    "organelle", "cell part", "organelle part", "protein-containing complex", 
    "membrane", "extracellular region", "membrane part", "binding", 
    "catalytic activity", "structural molecule activity", "nutrient reservoir activity", 
    "antioxidant activity", "metabolic process", "cellular process", 
    "localization", "response to stimulus", "multi-organism process", 
    "cellular component organization or biogenesis"), class = "factor")), row.names = c(NA, 
-19L), class = "data.frame")

我无法解决问题,也无法添加y轴和断裂轴。

谢谢

2 个答案:

答案 0 :(得分:2)

这是基准图解决方案。

1。要拟合所有轴标签,可以指定出图边距以适合所有轴标签。可能有必要根据最大标签长度来调整底部边距:

# a coefficient to transfer a label width to a margin width
tx_width_expansion <- 0.3
# define the plot margins
par(mar = c(tx_width_expansion * (max(nchar(obs$Function))), 4, 4, 5))
barplot(heigh = obs$Percentage, main = "gene ontology" , ylab = "Percentage",
    names.arg = obs$Function,
    col = "darkred", las = 2, plot = TRUE)

结果是

plot_1

2。如果要旋转标签,则与ggplot2相比,它在基本绘图中稍微复杂一些:

x <- barplot(heigh = obs$Percentage, main = "gene ontology" , ylab = "Percentage",
    col = "darkred", las = 2, plot = TRUE)
text(cex = 1 , x = x - .25, y = -2.25, obs$Function, 
    xpd = TRUE, adj = 1, srt = 45)

plot_2

3。最后,您确实需要第二个轴,可以通过在第一个轴上绘制第二个图来完成

par(new = TRUE)
barplot(heigh = 100 * obs$Percentage, main = "gene ontology" , ylab = "",
    axes = FALSE, col = "darkred")
mtext("Number of genes", side = 4, line = 3) 
axis(4, las = 1)

enter image description here

但是,请注意,仅当第二轴数据是通过对第一轴数据进行转换而获得的时,第二轴才是安全的。否则,结果可能会产生误导。这就是为什么我在第三个代码块中使用100 * obs$Percentage而不是100 * obs$Number数据的原因。

答案 1 :(得分:0)

如果您提供这样的数据框,则人们更容易获得帮助:

obs <- data.frame(
  GOTerm=c("GO:0005623", "GO0043226", "GO:005488"),
  Category=c("Cellular Component", "Cellular Component", "Molecular Function"),
  Number=c(6,5,104),
  Percentage=c(1.9, 1.6, 33.2),
  Function=c("cell", "organelle", "binding")
)

首先,我将重新调整功能列的高度,以便您可以直接在图中使用它,而不用手动编写单个的GOterm-titles(这也容易出错)。通过重新调平,由于字母原因,列的顺序不再更改:

obs$Function <- factor(obs$Function, levels=obs$Function)

要解决的问题:您拼错了列名(obs $ percentage而不是obs $ Percentage),这将是一个问题。但是即使那样,它在生成图形时还是有问题。 改用ggplot可能是最简单的:

library(ggplot2)
ggplot() +
  geom_bar (data=obs, aes(x=Function, y=Percentage), stat="identity", fill="darkred", color="black") + #base graph
  scale_y_continuous(limits=c(0,100), #limits the y axis
                     expand = c(0,0)) + #starts the y axis at 0
  theme_classic() + #basic theme
  theme( #extended theme options
    axis.title.x=element_blank(), #remove the x-axis title
    axis.text.x=element_text(angle=45, hjust=1) #rotate x-axis text (hjust is for horizontal leveling)
    )

ggplot-barplot

对于间隙图,在这里您也拼错了列名。另外,我认为在ylim中,您不想使用max(obs $ Number)而是max(obs $ Percentage),因为您的y轴源自obs $ Percentage。由于某些原因,gap-barplot似乎仍然是y-limit的两倍,所以只有一半。这里的红色必须与列数一样多,并且可以通过rep(“ darkred”,nrow(obs))解决。

library(plotrix)
gap.barplot(y=obs$Percentage,
            xaxlab=obs$Function,
            gap=c(10,20),
            col = rep("darkred",nrow(obs)),
            main="gene ontology",
            xlab = "try",
            ylab="Percentage",
            ylim=c(0, (0.5*(5+max(obs$Percentage)))),
            ytics=c(0,10,20,370,40,80,90,100),
            las=2
            )

gap.barplot