我正在尝试使用R绘制条形图,它看起来应该像这样:
我的数据:
GO_term Category Number Percentage Function
GO:0005623 Cellular Component 6 1.9 cell
GO:0043226 Cellular Component 5 1.6 organelle
GO:0044464 Cellular Component 6 1.9 cell part
GO:0044422 Cellular Component 2 0.6 organelle part
GO:0032991 Cellular Component 3 1 protein-containing complex
GO:0016020 Cellular Component 4 1.3 membrane
GO:0005576 Cellular Component 20 6.4 extracellular region
GO:0044425 Cellular Component 1 0.3 membrane part
GO:0005488 Molecular Function 104 33.2 binding
GO:0003824 Molecular Function 266 85 catalytic activity
GO:0005198 Molecular Function 3 1 structural molecule activity
GO:0045735 Molecular Function 3 1 nutrient reservoir activity
GO:0016209 Molecular Function 12 3.8 antioxidant activity
GO:0008152 Biological Process 189 60.4 metabolic process
GO:0009987 Biological Process 25 8 cellular process
GO:0051179 Biological Process 6 1.9 localization
GO:0050896 Biological Process 10 3.2 response to stimulus
GO:0051704 Biological Process 1 0.3 multi-organism process
GO:0071840 Biological Process 4 1.3 cellular component organization or biogenesis
我尝试使用R:
> obs <- read.table("try.csv", sep = ",", header = T)
> barplot(obs$percentage, main = "gene ontology" , ylab = "Percentage",
names.arg = c("cell", "organelle", "cell part ", "organelle part",
"protein-containing complex", "membrane", "extracellular region",
"membrane part", "binding", " catalytic activity", "structural molecule activity",
"nutrient reservoir activity", "antioxidant activity", "metabolic process",
"cellular process", "localization", "response to stimulus", " multi-organism process",
"cellular component organization or biogenesis"),
col = "darkred", las = 2)
这给了我
我尝试使用:p旋转轴
> barplot(obs$percentage, col = "grey50", main = "gene ontology", ylab = "Number",
ylim = c(0,5+max(obs$number)), xlab = "try", names.arg = c("cell", "organelle",
"cell part ", "organelle part", "protein-containing complex", "membrane",
"extracellular region", "membrane part", "binding", " catalytic activity",
"structural molecule activity", "nutrient reservoir activity", "antioxidant activity",
"metabolic process", "cellular process", "localization", "response to stimulus",
" multi-organism process", "cellular component organization or biogenesis"),
theme(axis.text.x = element_text(angle = 45, size = rel (1.5))))
但是它给了我错误:
width / 2错误:二进制运算符In的非数字参数 另外:警告消息:在mean.default(width)中:参数不是 数字或逻辑:返回NA
然后,我尝试使用以下方法切割y轴,以使数据看起来更干净:
> gap.barplot(obs$percentage, main = "gene ontology", ylab = "Percentage",
ylim = c(0,5+max(obs$number)), xlab = "try", names.arg = c("cell", "organelle",
"cell part ", "organelle part", "protein-containing complex", "membrane",
"extracellular region", "membrane part", "binding", " catalytic activity",
"structural molecule activity", "nutrient reservoir activity", "antioxidant activity",
"metabolic process", "cellular process", "localization", "response to stimulus",
" multi-organism process", "cellular component organization or biogenesis"),
las = 2, col = "darkred", gap=c(10, 30), ytics = c(0, 10, 20, 30, 40, 80, 90, 100))
输出:
>dput(obs)
输出:
structure(list(GO_term = structure(c(5L, 11L, 14L, 12L, 10L,
8L, 4L, 13L, 3L, 1L, 2L, 15L, 9L, 6L, 7L, 17L, 16L, 18L, 19L), .Label = c("GO:0003824",
"GO:0005198", "GO:0005488", "GO:0005576", "GO:0005623", "GO:0008152",
"GO:0009987", "GO:0016020", "GO:0016209", "GO:0032991", "GO:0043226",
"GO:0044422", "GO:0044425", "GO:0044464", "GO:0045735", "GO:0050896",
"GO:0051179", "GO:0051704", "GO:0071840"), class = "factor"),
Category = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Biological Process",
"Cellular Component", "Molecular Function"), class = "factor"),
Number = c(6L, 5L, 6L, 2L, 3L, 4L, 20L, 1L, 104L, 266L, 3L,
3L, 12L, 189L, 25L, 6L, 10L, 1L, 4L), Percentage = c(1.9,
1.6, 1.9, 0.6, 1, 1.3, 6.4, 0.3, 33.2, 85, 1, 1, 3.8, 60.4,
8, 1.9, 3.2, 0.3, 1.3), Function = structure(1:19, .Label = c("cell",
"organelle", "cell part", "organelle part", "protein-containing complex",
"membrane", "extracellular region", "membrane part", "binding",
"catalytic activity", "structural molecule activity", "nutrient reservoir activity",
"antioxidant activity", "metabolic process", "cellular process",
"localization", "response to stimulus", "multi-organism process",
"cellular component organization or biogenesis"), class = "factor")), row.names = c(NA,
-19L), class = "data.frame")
我无法解决问题,也无法添加y轴和断裂轴。
谢谢
答案 0 :(得分:2)
这是基准图解决方案。
1。要拟合所有轴标签,可以指定出图边距以适合所有轴标签。可能有必要根据最大标签长度来调整底部边距:
# a coefficient to transfer a label width to a margin width
tx_width_expansion <- 0.3
# define the plot margins
par(mar = c(tx_width_expansion * (max(nchar(obs$Function))), 4, 4, 5))
barplot(heigh = obs$Percentage, main = "gene ontology" , ylab = "Percentage",
names.arg = obs$Function,
col = "darkred", las = 2, plot = TRUE)
结果是
2。如果要旋转标签,则与ggplot2相比,它在基本绘图中稍微复杂一些:
x <- barplot(heigh = obs$Percentage, main = "gene ontology" , ylab = "Percentage",
col = "darkred", las = 2, plot = TRUE)
text(cex = 1 , x = x - .25, y = -2.25, obs$Function,
xpd = TRUE, adj = 1, srt = 45)
3。最后,您确实需要第二个轴,可以通过在第一个轴上绘制第二个图来完成
par(new = TRUE)
barplot(heigh = 100 * obs$Percentage, main = "gene ontology" , ylab = "",
axes = FALSE, col = "darkred")
mtext("Number of genes", side = 4, line = 3)
axis(4, las = 1)
但是,请注意,仅当第二轴数据是通过对第一轴数据进行转换而获得的时,第二轴才是安全的。否则,结果可能会产生误导。这就是为什么我在第三个代码块中使用100 * obs$Percentage
而不是100 * obs$Number
数据的原因。
答案 1 :(得分:0)
如果您提供这样的数据框,则人们更容易获得帮助:
obs <- data.frame(
GOTerm=c("GO:0005623", "GO0043226", "GO:005488"),
Category=c("Cellular Component", "Cellular Component", "Molecular Function"),
Number=c(6,5,104),
Percentage=c(1.9, 1.6, 33.2),
Function=c("cell", "organelle", "binding")
)
首先,我将重新调整功能列的高度,以便您可以直接在图中使用它,而不用手动编写单个的GOterm-titles(这也容易出错)。通过重新调平,由于字母原因,列的顺序不再更改:
obs$Function <- factor(obs$Function, levels=obs$Function)
要解决的问题:您拼错了列名(obs $ percentage而不是obs $ Percentage),这将是一个问题。但是即使那样,它在生成图形时还是有问题。 改用ggplot可能是最简单的:
library(ggplot2)
ggplot() +
geom_bar (data=obs, aes(x=Function, y=Percentage), stat="identity", fill="darkred", color="black") + #base graph
scale_y_continuous(limits=c(0,100), #limits the y axis
expand = c(0,0)) + #starts the y axis at 0
theme_classic() + #basic theme
theme( #extended theme options
axis.title.x=element_blank(), #remove the x-axis title
axis.text.x=element_text(angle=45, hjust=1) #rotate x-axis text (hjust is for horizontal leveling)
)
对于间隙图,在这里您也拼错了列名。另外,我认为在ylim中,您不想使用max(obs $ Number)而是max(obs $ Percentage),因为您的y轴源自obs $ Percentage。由于某些原因,gap-barplot似乎仍然是y-limit的两倍,所以只有一半。这里的红色必须与列数一样多,并且可以通过rep(“ darkred”,nrow(obs))解决。
library(plotrix)
gap.barplot(y=obs$Percentage,
xaxlab=obs$Function,
gap=c(10,20),
col = rep("darkred",nrow(obs)),
main="gene ontology",
xlab = "try",
ylab="Percentage",
ylim=c(0, (0.5*(5+max(obs$Percentage)))),
ytics=c(0,10,20,370,40,80,90,100),
las=2
)