Question

所以我一直试图做几个小时的“是/否”计数箱图。

我的数据集看起来像这样

> stack
         Site Plot Treatment Meters Retrieved
2   Southern    18   Control  -5.00         y
3   Southern    18   Control   9.55         y
4   Southern    18   Control   4.70         y
5   Southern    27   Control  -5.00         y
6   Southern    27   Control  20.00         n
9   Southern    18   Control  -0.10         y
17  Southern    18   Control  20.00         y
23  Southern    31   Control 100.00         y
53  Southern    25        Mu   3.55         n
54  Southern    20        Mu   5.90         y
55  Southern    25        Mu  -0.10         y
56  Southern    29        Mu   9.55         y
58  Southern    25        Mu   4.70         y
60  Southern    20        Mu   2.90         y
61  Southern    24        Mu   5.90         n
62  Southern    24        Mu   3.55         y
63  Southern    20        Mu   3.55         y
65  Southern    24        Mu   0.55         y
66  Southern    29        Mu   8.90         y
68  Southern    25        Mu   8.90         y
69  Southern    29        Mu   0.55         y
70  Southern    24        Mu   1.70         y
72  Southern    29        Mu  -5.00         y
76  Southern    29        Mu   1.70         y
77  Southern    25        Mu   9.55         y
78  Southern    25        Mu  13.20         y
79  Southern    29        Mu   3.55         y
80  Southern    25        Mu  15.00         y
81  Southern    25        Mu  -5.00         n
84  Southern    24        Mu   8.90         y
85  Southern    20        Mu   6.55         y
86  Southern    29        Mu   2.90         y
92  Southern    24        Mu  -0.10         y
93  Southern    20        Mu 100.00         y

我希望在为“治疗”和“米”分组时获得变量“已检索”的y（是）和n（否）的计数。

所以看起来应该是这样的

 Treatment Meters        Yes   No
     Control  -5.00         2   0
     Control   9.55         1   2
     Control   4.70         1   1
     Control  20.00         0   2
         Mu   3.55         4   0
         Mu   5.90         0   1
         Mu  -0.10         2   2
         Mu   9.55         1   0

根据这些数据，我想做一个堆积的箱形图，其中x =米，y =计数，处理为网格或其他东西。 like this

这是我的代码，但它不起作用

plot_data <- stack %>% 
  count(Retrieved, Treatment, Meters) %>% 
  group_by(Treatment, Meters) %>% 
  mutate(count= n)

plot_data

ggplot(plot_data, aes(x = Meters, y = count, fill = Treatment)) + 
  geom_col(position = "fill") + 
  geom_label(aes(label = count(count)), position = "fill", color = "white", vjust = 1, show.legend = FALSE) +
  scale_y_continuous(labels = count)

你能告诉我我做错了吗？

Answer 1

geom_bar就是这种情况，您甚至不需要使用group_by或count。（来自文档：“geom_bar使条形的高度与每组中的案例数成比例”。）

这应该做你想要的：

ggplot(stack, aes(x = Meters, fill = Treatment)) +
  geom_bar(position = "stack")

然而，条形将非常窄，因为“米”是连续的并且具有大范围。您可以通过将其转换为因子来解决此问题。一种方法是首先执行此操作：

data <- data %>%
  mutate(Meters = as.factor(Meters))

resulting plot

如果你想以你提到的格式获得计数（除了创建情节），你可以这样做：

data %>%
  count(Treatment, Meters, Retrieved) %>%
  spread(Retrieved, n, fill = 0) %>% 
  rename(Yes = y, No = n)

count为您group_by做了，所以我不需要从您的代码中执行此操作。然后，spread为y和n创建单独的列。最后，我将这些列重命名为Yes和No。

dplyr：跨多个变量的单个列的分类计数

1 个答案: