我有一些代码以堆叠的条形图显示出门的数量和门内的属。我编辑了代码,使所有NA元素都出现在每个小节的顶部,而更丰富的元素出现在底部,但是,这抛弃了我的调色板,该调色板根据门类组以及在该组中按字母顺序分配颜色。例如,细菌杆菌门被分配为蓝色,门内的每个属按字母顺序被分配为蓝色阴影。
我相信我可以更改levs变量以按字母顺序对元素进行排序,并按门进行分组,但是我还没有找到一种方法来实现。但是,目前,levs变量会按元素的丰度对元素进行排序。
#makes color pallete
ColourPalleteMulti <- function(df, group, subgroup){
# Find how many colour categories to create and the number of colours in each
categories <- aggregate(as.formula(paste(subgroup, group, sep="~" )), df, function(x) length(unique(x)))
category.start <- (scales::hue_pal(l = 100)(nrow(categories))) # Set the top of the colour pallete
category.end <- (scales::hue_pal(l = 40)(nrow(categories))) # set the bottom
# Build Colour pallette
colours <- unlist(lapply(1:nrow(categories),
function(i){
colorRampPalette(colors = c(category.start[i], category.end[i]))(categories[i,2])}))
return(colours)
}
library(tidyverse)
library("phyloseq"); packageVersion("phyloseq")
library(ggplot2)
library(scales)
library(RColorBrewer)
data("GlobalPatterns")
#filter phyloseq data
TopNOTUs <- names(sort(taxa_sums(GlobalPatterns), TRUE)[1:100])
gp.ch <- prune_species(TopNOTUs, GlobalPatterns)
#create dataframe
mdf = psmelt(gp.ch)
mdf$group <- paste0(mdf$Phylum, "-", mdf$Genus, sep = "")
#factor by abundance
levs <- names(sort(tapply(mdf$Abundance, mdf$Genus, sum)))
#load colors
colours <- ColourPalleteMulti(mdf, "Phylum", "Genus")
#put NA at the top
mdf %>%
mutate(Genus = fct_explicit_na(Genus, "NA"),
Genus = factor(Genus, levels = c("NA", levs))) %>%
#graph
ggplot(aes(Phylum)) +
geom_bar(aes(fill = Genus), colour = "grey", position = "stack") +
scale_fill_manual("Genus", values=c("#FFFFFF",colours)) +
ggtitle("Phylum and Genus Frequency") +
ylab("Frequency") +
theme(plot.title = element_text(hjust = 0.5))
运行此代码将显示一个条形图,其颜色位于奇数个位置。理想情况下,图形中的每个条形图都是原色,而每个堆栈都是该颜色的不同阴影。正确创建了调色板,但是由于上述问题,颜色分配不正确。任何帮助表示赞赏!
答案 0 :(得分:0)
欢迎来到stackoverflow。您正在这里做一些棘手的事情!我认为很难在函数中执行此操作,最大的障碍是将NA放在首位。只需使用tidyverse
管道,我就可以将它们放在一起。
这是您的基础设置,为没有phyloseq
的人们做了一些准备
# how to install if needed
#source('http://bioconductor.org/biocLite.R')
#biocLite('phyloseq')
library(tidyverse)
library(phyloseq)
library(scales)
library(RColorBrewer)
data("GlobalPatterns")
# filter phyloseq data
TopNOTUs <- names(sort(taxa_sums(GlobalPatterns), TRUE)[1:100])
gp.ch <- prune_species(TopNOTUs, GlobalPatterns)
# create dataframe
mdf <- psmelt(gp.ch)
首先,我将记录折叠为计数n
prep <-
mdf %>%
mutate(Genus = fct_explicit_na(Genus, "NA")) %>%
# summarizes data
count(Phylum, Genus) %>% # returns n as a count
mutate(
group = paste(Phylum, Genus, sep = "-"),
Phylum = fct_reorder(Phylum, n, sum),
has_genus = Genus != "NA"
) %>%
# this step helps with the factor ordering
arrange(Phylum, has_genus, n) %>%
mutate(group = fct_inorder(group)) %>%
# I then find some totals & an rank based on the value of n
group_by(Phylum) %>%
mutate(
ord = row_number(),
total = n()
) %>%
ungroup()
# Phylum Genus n group has_genus ord total
# <fct> <fct> <int> <chr> <lgl> <int> <int>
# Tenericutes NA 52 Tenericutes-NA FALSE 1 2
# Tenericutes Clostridium 26 Tenericutes-Clostridium TRUE 2 2
# Actinobacteria NA 130 Actinobacteria-NA FALSE 1 3
# Actinobacteria Rothia 26 Actinobacteria-Rothia TRUE 2 3
# Actinobacteria Bifidobacter~ 78 Actinobacteria-Bifidobact~ TRUE 3 3
然后我使用因子值填充hcl()
函数(类似于您的hue_pal()
df <-
prep %>%
mutate(
group = fct_inorder(group), # ordering in the stack
hue = as.integer(Phylum)*25,
light_base = 1-(ord)/(total+2),
light = floor(light_base * 100)
) %>%
# if the genus is missing, use white, otherwise create a hexcode
mutate(hex = ifelse(!has_genus, "#ffffff", hcl(h = hue, l = light)))
然后是情节
ggplot(df, aes(Phylum, n)) +
geom_col(aes(fill = group), colour = "grey") +
scale_fill_manual(values = df$hex, breaks = (df$group)) +
ggtitle("Phylum and Genus Frequency") +
ylab("Frequency") +
theme(plot.title = element_text(hjust = 0.5))
对于第二个问题,请保留上述所有prep
和df
的代码,然后将它们与原始mdf
表连接起来。 df
表的目的仅是生成颜色,而prep
是辅助表。 genus
和hex
之间应为1:1。在sample
中包含prep
列将返回780行,而不是30行,并且不再是1:1。这就是为什么您没有获得想要的结果的原因。 (我认为是ord
列被抛出了)。因此,使用上面的内容,然后添加它。我加入了set.seed()
和sample_frac()
,以使更改更加明显。我还旋转它以提高可读性。
set.seed(1234)
final_df <-
mdf %>%
sample_frac(0.9) %>%
mutate(
Genus = fct_explicit_na(Genus, "NA"),
# these 2 lines will sort in descending order by Proteobacteria
rank = as.integer(Phylum == "Proteobacteria" & Genus != "NA"), # T/F == 1/0
Sample = fct_reorder(Sample, rank, mean)
) %>%
count(Phylum, Genus, Sample, rank) %>%
left_join(df %>% select(-n))
ggplot(final_df, aes(Sample, n)) +
geom_col(aes(fill = group), position="fill") +#
scale_fill_manual("Genus", values = df$hex, breaks = (df$group)) +
ggtitle("Phylum and Genus Frequency") +
ylab("Frequency") +
scale_y_continuous(labels = percent, expand = expand_scale(0)) +
coord_flip() +
theme(plot.title = element_text(hjust = 0.5))