将数据拆分为R中的组

时间:2015-03-31 11:52:09

标签: r ggplot2 dplyr

我的数据框如下所示:

plant   distance
one 0
one 1
one 2
one 3
one 4
one 5
one 6
one 7
one 8
one 9
one 9.9
two 0
two 1
two 2
two 3
two 4
two 5
two 6
two 7
two 8
two 9
two 9.5

我希望按间隔(例如,间隔= 3)将每个级别的距离分成组,并计算每个组的百分比。最后,绘制每组的每个级别的百分比,如下所示:

enter image description here

我的代码:

library(ggplot2)
library(dplyr)

dat <- data %>% 
  mutate(group = factor(cut(distance, seq(0, max(distance), 3), F))) %>% 
  group_by(plant, group) %>% 
  summarise(percentage = n()) %>% 
  mutate(percentage = percentage / sum(percentage))
p <- ggplot(dat, aes(x = plant, y = percentage, fill = group)) + 
  geom_bar(stat = "identity", position = "stack")+
  scale_y_continuous(labels=percent)
p

但我的情节如下所示:group 4遗失了。 enter image description here

我发现dat错了,group 4NA

enter image description here

可能的原因是group 4的长度小于interval=3,所以我的问题是如何修复它?提前谢谢!

1 个答案:

答案 0 :(得分:0)

我已经解决了这个问题。原因是cut(distance, seq(0, max(distance), 3), F)没有包含最大值和最小值。

这是我的解决方案:

dat <- my_data %>% 
  mutate(group = factor(cut(distance, seq(from = min(distance), by = 3,   length.out = n()/ 3 + 1),  include.lowest = TRUE)))  %>% 
  count(plant, group) %>%
  group_by(plant) %>%
  mutate(percentage = n / sum(n))