Question

我有一个这样的数据框：

nthreads ab_1 ab_2 ab_3 ab_4 ...
1        0    0    0    0    ...
2        1    0    12   1    ...
4        2    1    22   1    ...
8        10   2    103  8    ...

每个ab_X表示在我的代码中触发中止的不同原因。我想在一个显示nthreads vs aborts的条形图中总结所有中止原因，并在每个条形图中堆叠不同的ab_X。

我能做到

ggplot(data, aes(x=factor(nthreads), y=ab_1+ab_2+ab_3+ab_4)) +
  geom_bar(stat="identity")

但它只给出了中止的总数。我知道有一个填充，但我无法使用连续变量。

Answer 1

您必须先melt数据框

library(data.table)
dt_melt <- melt(data, id.vars = 'nthreads')
ggplot(dt_melt, aes(x = nthreads, y = value, fill = variable)) + 
    geom_bar(stat = 'identity')

Answer 2

它给出了中止的总数，因为您将它们一起添加：）

首先需要从宽到长格式获取数据，即为中止原因创建一列，为其值创建第二列。您可以使用tidyr::gather。我还发现geom_col比geom_bar更方便：

library(tidyr)
library(ggplot2)
data %>% 
  gather(abort, value, -nthreads) %>% 
  ggplot(aes(factor(nthreads), value)) + 
    geom_col(aes(fill = abort)) + 
    labs(x = "nthreads", y = "count")

请注意，值的范围使得某些条形图很难看到，因此您可能需要考虑尺度，甚至可能是方面。

基于ggplot2的4个变量的堆积条形图

2 个答案: