Question

我正在尝试使用R绘制条形图，它看起来应该像这样：

我的数据：

GO_term Category    Number  Percentage  Function
GO:0005623  Cellular Component  6   1.9 cell
GO:0043226  Cellular Component  5   1.6 organelle
GO:0044464  Cellular Component  6   1.9 cell part
GO:0044422  Cellular Component  2   0.6 organelle part
GO:0032991  Cellular Component  3   1   protein-containing complex
GO:0016020  Cellular Component  4   1.3 membrane
GO:0005576  Cellular Component  20  6.4 extracellular region
GO:0044425  Cellular Component  1   0.3 membrane part
GO:0005488  Molecular Function  104 33.2    binding
GO:0003824  Molecular Function  266 85  catalytic activity
GO:0005198  Molecular Function  3   1   structural molecule activity
GO:0045735  Molecular Function  3   1   nutrient reservoir activity
GO:0016209  Molecular Function  12  3.8 antioxidant activity
GO:0008152  Biological Process  189 60.4    metabolic process
GO:0009987  Biological Process  25  8   cellular process
GO:0051179  Biological Process  6   1.9 localization
GO:0050896  Biological Process  10  3.2 response to stimulus
GO:0051704  Biological Process  1   0.3 multi-organism process
GO:0071840  Biological Process  4   1.3 cellular component organization or biogenesis

我尝试使用R：

> obs <- read.table("try.csv", sep = ",", header = T)

> barplot(obs$percentage, main = "gene ontology" , ylab = "Percentage",
    names.arg = c("cell", "organelle", "cell part ", "organelle part",
    "protein-containing complex", "membrane", "extracellular region",
    "membrane part", "binding", " catalytic activity", "structural molecule activity",
    "nutrient reservoir activity", "antioxidant activity", "metabolic process",
    "cellular process", "localization", "response to stimulus", " multi-organism process", 
    "cellular component organization or biogenesis"), 
    col = "darkred", las = 2)

这给了我

我尝试使用：p旋转轴

> barplot(obs$percentage, col = "grey50", main = "gene ontology", ylab = "Number", 
    ylim = c(0,5+max(obs$number)), xlab = "try", names.arg = c("cell", "organelle",
    "cell part ", "organelle part", "protein-containing complex", "membrane",
    "extracellular region", "membrane part", "binding", " catalytic activity",
    "structural molecule activity", "nutrient reservoir activity", "antioxidant activity",
    "metabolic process", "cellular process", "localization", "response to stimulus",
    " multi-organism process", "cellular component organization or biogenesis"),
    theme(axis.text.x = element_text(angle = 45, size = rel (1.5))))

但是它给了我错误：

width / 2错误：二进制运算符In的非数字参数另外：警告消息：在mean.default（width）中：参数不是数字或逻辑：返回NA

然后，我尝试使用以下方法切割y轴，以使数据看起来更干净：

> gap.barplot(obs$percentage, main = "gene ontology", ylab = "Percentage",
    ylim = c(0,5+max(obs$number)), xlab = "try", names.arg = c("cell", "organelle",
    "cell part ", "organelle part", "protein-containing complex", "membrane",
    "extracellular region", "membrane part", "binding", " catalytic activity",
    "structural molecule activity", "nutrient reservoir activity", "antioxidant activity",
    "metabolic process", "cellular process", "localization", "response to stimulus",
    " multi-organism process", "cellular component organization or biogenesis"),
    las = 2,  col = "darkred", gap=c(10, 30), ytics = c(0, 10, 20, 30, 40, 80, 90, 100))

输出：

>dput(obs)输出：

structure(list(GO_term = structure(c(5L, 11L, 14L, 12L, 10L, 
8L, 4L, 13L, 3L, 1L, 2L, 15L, 9L, 6L, 7L, 17L, 16L, 18L, 19L), .Label = c("GO:0003824", 
"GO:0005198", "GO:0005488", "GO:0005576", "GO:0005623", "GO:0008152", 
"GO:0009987", "GO:0016020", "GO:0016209", "GO:0032991", "GO:0043226", 
"GO:0044422", "GO:0044425", "GO:0044464", "GO:0045735", "GO:0050896", 
"GO:0051179", "GO:0051704", "GO:0071840"), class = "factor"), 
    Category = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 
    3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Biological Process", 
    "Cellular Component", "Molecular Function"), class = "factor"), 
    Number = c(6L, 5L, 6L, 2L, 3L, 4L, 20L, 1L, 104L, 266L, 3L, 
    3L, 12L, 189L, 25L, 6L, 10L, 1L, 4L), Percentage = c(1.9, 
    1.6, 1.9, 0.6, 1, 1.3, 6.4, 0.3, 33.2, 85, 1, 1, 3.8, 60.4, 
    8, 1.9, 3.2, 0.3, 1.3), Function = structure(1:19, .Label = c("cell", 
    "organelle", "cell part", "organelle part", "protein-containing complex", 
    "membrane", "extracellular region", "membrane part", "binding", 
    "catalytic activity", "structural molecule activity", "nutrient reservoir activity", 
    "antioxidant activity", "metabolic process", "cellular process", 
    "localization", "response to stimulus", "multi-organism process", 
    "cellular component organization or biogenesis"), class = "factor")), row.names = c(NA, 
-19L), class = "data.frame")

我无法解决问题，也无法添加y轴和断裂轴。

谢谢

Answer 1

这是基准图解决方案。

1。要拟合所有轴标签，可以指定出图边距以适合所有轴标签。可能有必要根据最大标签长度来调整底部边距：

# a coefficient to transfer a label width to a margin width
tx_width_expansion <- 0.3
# define the plot margins
par(mar = c(tx_width_expansion * (max(nchar(obs$Function))), 4, 4, 5))
barplot(heigh = obs$Percentage, main = "gene ontology" , ylab = "Percentage",
    names.arg = obs$Function,
    col = "darkred", las = 2, plot = TRUE)

结果是

2。如果要旋转标签，则与ggplot2相比，它在基本绘图中稍微复杂一些：

x <- barplot(heigh = obs$Percentage, main = "gene ontology" , ylab = "Percentage",
    col = "darkred", las = 2, plot = TRUE)
text(cex = 1 , x = x - .25, y = -2.25, obs$Function, 
    xpd = TRUE, adj = 1, srt = 45)

3。最后，您确实需要第二个轴，可以通过在第一个轴上绘制第二个图来完成

par(new = TRUE)
barplot(heigh = 100 * obs$Percentage, main = "gene ontology" , ylab = "",
    axes = FALSE, col = "darkred")
mtext("Number of genes", side = 4, line = 3) 
axis(4, las = 1)

但是，请注意，仅当第二轴数据是通过对第一轴数据进行转换而获得的时，第二轴才是安全的。否则，结果可能会产生误导。这就是为什么我在第三个代码块中使用100 * obs$Percentage而不是100 * obs$Number数据的原因。

Answer 2

如果您提供这样的数据框，则人们更容易获得帮助：

obs <- data.frame(
  GOTerm=c("GO:0005623", "GO0043226", "GO:005488"),
  Category=c("Cellular Component", "Cellular Component", "Molecular Function"),
  Number=c(6,5,104),
  Percentage=c(1.9, 1.6, 33.2),
  Function=c("cell", "organelle", "binding")
)

首先，我将重新调整功能列的高度，以便您可以直接在图中使用它，而不用手动编写单个的GOterm-titles（这也容易出错）。通过重新调平，由于字母原因，列的顺序不再更改：

obs$Function <- factor(obs$Function, levels=obs$Function)

要解决的问题：您拼错了列名（obs $ percentage而不是obs $ Percentage），这将是一个问题。但是即使那样，它在生成图形时还是有问题。改用ggplot可能是最简单的：

library(ggplot2)
ggplot() +
  geom_bar (data=obs, aes(x=Function, y=Percentage), stat="identity", fill="darkred", color="black") + #base graph
  scale_y_continuous(limits=c(0,100), #limits the y axis
                     expand = c(0,0)) + #starts the y axis at 0
  theme_classic() + #basic theme
  theme( #extended theme options
    axis.title.x=element_blank(), #remove the x-axis title
    axis.text.x=element_text(angle=45, hjust=1) #rotate x-axis text (hjust is for horizontal leveling)
    )

对于间隙图，在这里您也拼错了列名。另外，我认为在ylim中，您不想使用max（obs $ Number）而是max（obs $ Percentage），因为您的y轴源自obs $ Percentage。由于某些原因，gap-barplot似乎仍然是y-limit的两倍，所以只有一半。这里的红色必须与列数一样多，并且可以通过rep（“ darkred”，nrow（obs））解决。

library(plotrix)
gap.barplot(y=obs$Percentage,
            xaxlab=obs$Function,
            gap=c(10,20),
            col = rep("darkred",nrow(obs)),
            main="gene ontology",
            xlab = "try",
            ylab="Percentage",
            ylim=c(0, (0.5*(5+max(obs$Percentage)))),
            ytics=c(0,10,20,370,40,80,90,100),
            las=2
            )

绘制带有多个标签的条形图？

2 个答案: