我有152个样本(行)的相对细菌丰度的数据框。我想绘制所有样本中每个细菌群的总体丰度的堆积条形图(例如Actinovacteria vs. Bacteroidetes vs. Firmicutes等。我想用它进行颜色编码,这也是一个传奇。有人可以建议怎么做?我的问题是我不确定如何获得用于在R中绘图的列总数。谢谢。
row.names Actinobacteria Bacteroidetes Firmicutes Fusobacteria Proteobacteria Verrucomicrobia Other
1 sample1 0.0084246282 0.41627099 0.55475503 0.000000e+00 7.245180e-04 5.391762e-05 1.977092e-02
2 sample2 0.0168571327 0.13298800 0.80289437 3.560112e-05 4.272135e-03 4.238314e-02 5.696180e-04
3 sample3 0.0020299288 0.53813817 0.42367947 3.311006e-02 7.978327e-04 3.534702e-05 2.209189e-03
答案 0 :(得分:1)
我不清楚样本名称是否是数据框中的行名称,因此我只是重新创建数据框,将样本名称放在变量中,与细菌名称相同:
Sample Actinobacteria Bacteroidetes Firmicutes Fusobacteria Proteobacteria
1 sample1 0.008424628 0.4162710 0.5547550 0.000000e+00 0.0007245180
2 sample2 0.016857133 0.1329880 0.8028944 3.560112e-05 0.0042721350
3 sample3 0.002029929 0.5381382 0.4236795 3.311006e-02 0.0007978327
Verrucomicrobia Other
1 5.391762e-05 0.019770920
2 4.238314e-02 0.000569618
3 3.534702e-05 0.002209189
要重现此数据集,您可以运行以下命令:
df <- structure(list(Sample = structure(1:3, .Label = c("sample1",
"sample2", "sample3"), class = "factor"), Actinobacteria = c(0.0084246282,
0.0168571327, 0.0020299288), Bacteroidetes = c(0.41627099, 0.132988,
0.53813817), Firmicutes = c(0.55475503, 0.80289437, 0.42367947
), Fusobacteria = c(0, 3.560112e-05, 0.03311006), Proteobacteria = c(0.000724518,
0.004272135, 0.0007978327), Verrucomicrobia = c(5.391762e-05,
0.04238314, 3.534702e-05), Other = c(0.01977092, 0.000569618,
0.002209189)), .Names = c("Sample", "Actinobacteria", "Bacteroidetes",
"Firmicutes", "Fusobacteria", "Proteobacteria", "Verrucomicrobia",
"Other"), class = "data.frame", row.names = c("1", "2", "3"))
正如@ zx8754建议的那样,该数据帧需要重新整形,即从宽格式转换为长格式。有关详细信息,请查看此link以获取一些示例。
如果上面的数据框名为df
,则以下命令将以长格式重新整形:
library(reshape2)
df_long <- melt(df, id.vars = "Sample", variable.name = "Phyla")
从这里我们可以使用ggplot绘图:
library(ggplot2)
ggplot(df_long, aes(x = Sample, y = value, fill = Phyla)) +
geom_bar(stat = "identity")
给出: