根据填充值在R中排序条形图

时间:2016-08-19 16:36:22

标签: r ggplot2

这个问题在stacko上被提出了一百万次,但我似乎无法找到适合我特定问题的解决方案。

我有一个数据框,其中包括一列物种和一列genome_names:

species                  genome_name
Acinetobacter baumannii  Acinetobacter baumanii BIDMC 56 
Acinetobacter baumannii  Acinetobacter baumannii 1032359
Klebsiella pneumoniae    Klebsiella pneumoniae CHS 30
etc...

使用此代码,我创建了一个barplot物种,其高度为genome_name:

library(ggplot2)
ggplot(PATRIC_genomes_AMR_2_ris_subset,aes(x=species,fill=genome_name)) + 
  geom_bar(colour="black") + scale_colour_continuous(guide = FALSE) + 
  labs(title="Number of unique strains") +
  labs(x = "Species",y="#Strains") + theme(legend.position="none") + 
  theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) 

我想命令这个条形图增加y的值(genome_name的数量)。我盲目地试图通过将我的数据放在一个因素中来做到这一点:

Error in `[<-.data.frame`(`*tmp*`, del, value = NULL) : 
missing values are not allowed in subscripted assignments of data frames

3 个答案:

答案 0 :(得分:1)

在绘制之前重新排序因子水平:

df $ species&lt; - reorder(df $ species,df $ ge nome_name)

修改 我没有更仔细地查看数据。这绘制了按数字排序的独特菌株的数量。

library(dplyr)
library(ggplot2)

df %>%
  group_by(species) %>%
  summarise(unique_strains = length(unique(genome_name))) %>%
  mutate(species = reorder(species, unique_strains)) %>%
  ggplot(aes(species, unique_strains)) + geom_bar(stat = "identity") + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) + 
  xlab(NULL) +
  scale_y_log10()

答案 1 :(得分:1)

library(ggplot2)
PATRIC_genomes_AMR_2_ris_subset <- read.csv("genomes_subset.csv", header = T)
PATRIC_genomes_AMR_2_ris_subset <- dplyr::sample_n(PATRIC_genomes_AMR_2_ris_subset, 300)

PATRIC_genomes_AMR_2_ris_subset <- PATRIC_genomes_AMR_2_ris_subset[order(PATRIC_genomes_AMR_2_ris_subset$species),]


# Order by genome_name
PATRIC_genomes_AMR_2_ris_subset <- within(PATRIC_genomes_AMR_2_ris_subset, 
                   Position     <- factor(genome_name, 
                                      levels=names(sort(table(genome_name), 
                                                        decreasing=TRUE))))

enter image description here

ggplot(PATRIC_genomes_AMR_2_ris_subset,aes(x=species,fill=genome_name)) + 
  geom_bar(colour="black") + scale_colour_continuous(guide = FALSE) + 
  labs(title="Number of unique strains") +
  labs(x = "Species",y="#Strains") + theme(legend.position="none") + 
  theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) 

# Order by species
PATRIC_genomes_AMR_2_ris_subset <- within(PATRIC_genomes_AMR_2_ris_subset, 
                                          species <- factor(species, 
                                                         levels=names(sort(table(species), 
                                                         decreasing=TRUE))))

ggplot(PATRIC_genomes_AMR_2_ris_subset,aes(x=species,fill=genome_name)) + 
  geom_bar(colour="black") + scale_colour_continuous(guide = FALSE) + 
  labs(title="Number of unique strains") +
  labs(x = "Species",y="#Strains") + theme(legend.position="none") + 
  theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) 

enter image description here

这与this几乎相同,但是你提到的是你用填充值genome_name对它进行排序,这有点不同,我们还要看看排序如何影响运行时间,所以这不是重复。

答案 2 :(得分:0)

要订购条形图,请将species设置为具有按出现次数排序的级别的因子。

绘图需要很长时间,因为你实际上为每一对speciesgenome_name绘制了一个条形图(确切地说是12,339条),并按物种堆叠条形图。如果你只想要黑条,如果你拿出fill美学,ggplot可以更快地聚合,因为每个物种只画一个条:

# download data
df <- gsheet::gsheet2tbl('https://docs.google.com/spreadsheets/d/16oHo85Pb8PVX2VqxlqEHizn10H3jVdjRC-kDrELcOfs/edit#gid=1638547987')

ggplot(df, aes(x = factor(species, names(sort(-table(species)))))) + 
    geom_bar(colour = "black") + 
    labs(title = "Number of unique strains") +
    labs(x = "Species", y = "#Strains") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) 

plot with black bars

如果使用相同的方法用fill美学绘图,那么你只会得到黑条,因为colour中的geom_bar美学设置在每个周围都会出现黑色描边叠加的条形,它们有多么小,它们掩盖了填充的颜色。避免此问题的一种方法是简单地取出colour = "black"

ggplot(df, aes(x = factor(species, names(sort(-table(species)))), fill = genome_name)) + 
    geom_bar() + 
    labs(title = "Number of unique strains") +
    labs(x = "Species", y = "#Strains") + 
    theme(legend.position = "none",
          axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) 

plot with colored bars

如果你真的想在每个堆叠的条形图上设置黑色笔划,则需要将size设置为足够小的尺寸以使笔划不覆盖填充:

ggplot(df, aes(x = factor(species, names(sort(-table(species)))), fill = genome_name)) + 
    geom_bar(colour = "black", size = 0.01) + 
    labs(title = "Number of unique strains") +
    labs(x = "Species", y = "#Strains") + 
    theme(legend.position = "none",
          axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) 

plot with colored bars with black stroke