这个问题在stacko上被提出了一百万次,但我似乎无法找到适合我特定问题的解决方案。
我有一个数据框,其中包括一列物种和一列genome_names:
species genome_name
Acinetobacter baumannii Acinetobacter baumanii BIDMC 56
Acinetobacter baumannii Acinetobacter baumannii 1032359
Klebsiella pneumoniae Klebsiella pneumoniae CHS 30
etc...
使用此代码,我创建了一个barplot物种,其高度为genome_name:
library(ggplot2)
ggplot(PATRIC_genomes_AMR_2_ris_subset,aes(x=species,fill=genome_name)) +
geom_bar(colour="black") + scale_colour_continuous(guide = FALSE) +
labs(title="Number of unique strains") +
labs(x = "Species",y="#Strains") + theme(legend.position="none") +
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))
我想命令这个条形图增加y的值(genome_name的数量)。我盲目地试图通过将我的数据放在一个因素中来做到这一点:
Error in `[<-.data.frame`(`*tmp*`, del, value = NULL) :
missing values are not allowed in subscripted assignments of data frames
答案 0 :(得分:1)
在绘制之前重新排序因子水平:
df $ species&lt; - reorder(df $ species,df $ ge nome_name)
修改强> 我没有更仔细地查看数据。这绘制了按数字排序的独特菌株的数量。
library(dplyr)
library(ggplot2)
df %>%
group_by(species) %>%
summarise(unique_strains = length(unique(genome_name))) %>%
mutate(species = reorder(species, unique_strains)) %>%
ggplot(aes(species, unique_strains)) + geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
xlab(NULL) +
scale_y_log10()
答案 1 :(得分:1)
library(ggplot2)
PATRIC_genomes_AMR_2_ris_subset <- read.csv("genomes_subset.csv", header = T)
PATRIC_genomes_AMR_2_ris_subset <- dplyr::sample_n(PATRIC_genomes_AMR_2_ris_subset, 300)
PATRIC_genomes_AMR_2_ris_subset <- PATRIC_genomes_AMR_2_ris_subset[order(PATRIC_genomes_AMR_2_ris_subset$species),]
# Order by genome_name
PATRIC_genomes_AMR_2_ris_subset <- within(PATRIC_genomes_AMR_2_ris_subset,
Position <- factor(genome_name,
levels=names(sort(table(genome_name),
decreasing=TRUE))))
ggplot(PATRIC_genomes_AMR_2_ris_subset,aes(x=species,fill=genome_name)) +
geom_bar(colour="black") + scale_colour_continuous(guide = FALSE) +
labs(title="Number of unique strains") +
labs(x = "Species",y="#Strains") + theme(legend.position="none") +
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))
# Order by species
PATRIC_genomes_AMR_2_ris_subset <- within(PATRIC_genomes_AMR_2_ris_subset,
species <- factor(species,
levels=names(sort(table(species),
decreasing=TRUE))))
ggplot(PATRIC_genomes_AMR_2_ris_subset,aes(x=species,fill=genome_name)) +
geom_bar(colour="black") + scale_colour_continuous(guide = FALSE) +
labs(title="Number of unique strains") +
labs(x = "Species",y="#Strains") + theme(legend.position="none") +
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))
这与this几乎相同,但是你提到的是你用填充值genome_name
对它进行排序,这有点不同,我们还要看看排序如何影响运行时间,所以这不是重复。
答案 2 :(得分:0)
要订购条形图,请将species
设置为具有按出现次数排序的级别的因子。
绘图需要很长时间,因为你实际上为每一对species
和genome_name
绘制了一个条形图(确切地说是12,339条),并按物种堆叠条形图。如果你只想要黑条,如果你拿出fill
美学,ggplot可以更快地聚合,因为每个物种只画一个条:
# download data
df <- gsheet::gsheet2tbl('https://docs.google.com/spreadsheets/d/16oHo85Pb8PVX2VqxlqEHizn10H3jVdjRC-kDrELcOfs/edit#gid=1638547987')
ggplot(df, aes(x = factor(species, names(sort(-table(species)))))) +
geom_bar(colour = "black") +
labs(title = "Number of unique strains") +
labs(x = "Species", y = "#Strains") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
如果使用相同的方法用fill
美学绘图,那么你只会得到黑条,因为colour
中的geom_bar
美学设置在每个周围都会出现黑色描边叠加的条形,它们有多么小,它们掩盖了填充的颜色。避免此问题的一种方法是简单地取出colour = "black"
:
ggplot(df, aes(x = factor(species, names(sort(-table(species)))), fill = genome_name)) +
geom_bar() +
labs(title = "Number of unique strains") +
labs(x = "Species", y = "#Strains") +
theme(legend.position = "none",
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
如果你真的想在每个堆叠的条形图上设置黑色笔划,则需要将size
设置为足够小的尺寸以使笔划不覆盖填充:
ggplot(df, aes(x = factor(species, names(sort(-table(species)))), fill = genome_name)) +
geom_bar(colour = "black", size = 0.01) +
labs(title = "Number of unique strains") +
labs(x = "Species", y = "#Strains") +
theme(legend.position = "none",
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))