堆叠条形图ggplot2-按特定变量对样本进行排序,从最高到最低

时间:2019-07-26 17:01:41

标签: r ggplot2

我正在尝试以堆叠的条形图格式绘制一个包含2000多个样本的数据集,每个样本(以“ SampleID”表示)在x轴上,而在y轴({{ 1}})。我希望按以下测量变量顺序显示/排序样品:Measurement1-6,1、5、2、3和6,并从最高到最低测量值。以下是15个示例的子集,作为我正在使用的示例,我将其称为“ dummy_set”数据框:

Measurement4

这是我所做的基本操作:

  1. 融合数据集: melt_dummy_set <-melt(dummy_set,id.var =“ SampleID”)

    合并的数据集如下所示:

        SampleID Measurement1 Measurement2 Measurement3 Measurement4 Measurement5 Measurement6
    1         A         0.05         0.00         0.95         0.00          0.0         0.00
    2         B         0.00         0.00         0.43         0.56          0.0         0.01
    3         C         0.64         0.36         0.00         0.00          0.0         0.00
    4         D         0.00         0.82         0.18         0.00          0.0         0.00
    5         E         0.00         0.60         0.00         0.40          0.0         0.00
    6         F         0.80         0.00         0.00         0.20          0.0         0.00
    7         G         0.00         0.00         0.00         1.00          0.0         0.00
    8         H         0.00         0.00         0.00         1.00          0.0         0.00
    9         I         0.00         0.00         1.00         0.00          0.0         0.00
    10        J         0.00         0.00         1.00         0.00          0.0         0.00
    11        K         0.25         0.00         0.00         0.45          0.3         0.00
    12        L         0.10         0.00         0.00         0.10          0.8         0.00
    13        M         0.19         0.10         0.00         0.70          0.0         0.01
    14        N         0.90         0.00         0.00         0.10          0.0         0.00
    15        O         0.00         0.10         0.40         0.00          0.5         0.00
    
  2. 使用ggplot()和geom_bar()绘制融化的数据集:

    head(melt_dummy_set)
        SampleID     variable value
    1         A Measurement1  0.05
    2         B Measurement1  0.00
    3         C Measurement1  0.64
    4         D Measurement1  0.00
    5         E Measurement1  0.00
    6         F Measurement1  0.80
    

Original stacked bar chart

如您所见,样品按照列出的原始顺序绘制(A-O)。但是,我希望按照以下顺序绘制它们:G,H,M,B,K,N,F,C,L,O,D,E,I,J和A。

基于其他类似的Stack Overflow问题,我收集到了我需要按所需顺序重新设置/重新建立因素。到目前为止,这是我尝试过的:

ggplot(melt_dummy_set, aes(x = SampleID, y = value, fill = variable)) + 
geom_bar(stat = "identity") + 

Attempt 1 results

#Attempt 1
reordered_melt_dummy_set <- transform(melt_dummy_set, variable = reorder(variable, -value))
ggplot(reordered_melt_dummy_set, aes(x = SampleID, y = value, fill = variable)) + 
  geom_bar(stat = "identity") + 

Attempt 2 results

我的第三次尝试导致了多个错误(在代码行之后立即以“ ##”表示)

#Attempt 2
copy_melt_dummy_set <- melt_dummy_set
copy_melt_dummy_set$variable <- factor(copy_melt_dummy_set$variable, levels = c("Measurement4", "Measurement5", "Measurement1", "Measurement2", "Measurement3", "Measurement6"))
ggplot(copy_melt_dummy_set, aes(x = SampleID, y = value, fill = variable)) + 
  geom_bar(stat = "identity") + 

#Attempt 3
copy2_melt_dummy_set <- melt_dummy_set

copy2_melt_dummy_set$SampleID <- factor(copy2_melt_dummy_set$SampleID, levels = copy2_melt_dummy_set[order(-copy2_melt_dummy_set$value), "variable"])
##Error in `levels<-`(`*tmp*`, value = as.character(levels)) : factor level [2] is duplicated

copy2_melt_dummy_set$variable <- factor(copy2_melt_dummy_set$variable, levels = copy2_melt_dummy_set[order(copy2_melt_dummy_set$value), "variable"])
## Error in `levels<-`(`*tmp*`, value = as.character(levels)) : factor level [2] is duplicated

copy2_melt_dummy_set$SampleID <- factor(copy2_melt_dummy_set$SampleID, levels = copy2_melt_dummy_set[order(-copy2_melt_dummy_set$variable), "SampleID"])
## Error in `levels<-`(`*tmp*`, value = as.character(levels)) : factor level [16] is duplicated
## In addition: Warning message: In Ops.factor(copy2_melt_dummy_set$variable) : ‘-’ not meaningful for factors

copy2_melt_dummy_set$SampleID <- factor(copy2_melt_dummy_set$SampleID, levels = copy2_melt_dummy_set[order(-copy2_melt_dummy_set$value), "SampleID"])
## Error in `levels<-`(`*tmp*`, value = as.character(levels)) : factor level [16] is duplicated

copy2_melt_dummy_set$SampleID <- factor(copy2_melt_dummy_set$SampleID, levels = copy2_melt_dummy_set[order(-copy2_melt_dummy_set$value), "value"])
## Error in `levels<-`(`*tmp*`, value = as.character(levels)) : factor level [2] is duplicated

Attempt 4 results

#Attempt 4
copy3_melt_dummy_set <- melt_dummy_set[order(melt_dummy_set$variable, -melt_dummy_set$value), ]
head(copy3_melt_dummy_set)
ggplot(copy3_melt_dummy_set, aes(x = SampleID, y = value, fill = variable)) + 
  geom_bar(stat = "identity") + 

Attempt 5 results

#Attempt 5
ggplot(melt_dummy_set[order(melt_dummy_set$variable, -melt_dummy_set$value), ], aes(x = SampleID, y = value, fill = variable)) + 
  geom_bar(stat = "identity") + 

Attempt 6 results

#Attempt 6
new_melt_dummy_set <- within(melt_dummy_set, 
                             variable <- factor(variable, levels = names(sort(table(variable), decreasing = TRUE))))
ggplot(new_melt_dummy_set, aes(x = SampleID, y = value, fill = variable)) + 
  geom_bar(stat = "identity") + 

Attempt 7 results

在所有情况下,我都无法获得要重组的x轴上的实际样本。我觉得可能有一个简单的解决方法,但是我不知道自己在做什么错。有什么建议吗?

1 个答案:

答案 0 :(得分:0)

这是您要去的地方吗?我想你很亲密。无需将Measurement变量列变成一个因数,您需要根据SampleID值的顺序对Measurement列进行排序。这是在计算sample_order的行中发生的情况:

library(tidyverse)

dummy_set <- tibble(
  SampleID = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O"),
  Measurement1 = c(0.05, 0, 0.64, 0, 0, 0.8, 0, 0, 0, 0, 0.25, 0.1, 0.19, 0.9, 0),
  Measurement2 = c(0, 0, 0.36, 0.82, 0.6, 0, 0, 0, 0, 0, 0, 0, 0.1, 0, 0.1),
  Measurement3 = c(0.95, 0.43, 0, 0.18, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0.4),
  Measurement4 = c(0, 0.56, 0, 0, 0.4, 0.2, 1, 1, 0, 0, 0.45, 0.1, 0.7, 0.1, 0),
  Measurement5 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.3, 0.8, 0, 0, 0.5),
  Measurement6 = c(0, 0.01, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.01, 0, 0)
)

sample_order <- dummy_set %>%
  arrange(desc(Measurement4), desc(Measurement1), desc(Measurement5), desc(Measurement2), desc(Measurement3), desc(Measurement6)) %>%
  pull(SampleID)

melt_dummy_set <- dummy_set %>%
  gather(variable, value, -SampleID)

reordered_melt_dummy_set <- melt_dummy_set %>%
  mutate(SampleID = factor(SampleID, levels = sample_order))

plot_ordered <- ggplot(reordered_melt_dummy_set, aes(x = SampleID, y = value, fill = variable)) +
  geom_bar(stat = "identity") +
  scale_y_continuous(expand = c(0,0)) +
  theme(axis.ticks.x = element_blank(), panel.grid = element_blank(), axis.line = element_line(color = "black"), panel.border = element_blank(), panel.background = element_blank())

plot_ordered

reprex package(v0.3.0)于2019-07-26创建