从包含间隔数据的数据集中对每年的观测值进行分组和计数

时间:2017-07-12 13:52:27

标签: r dplyr

我有关于许多不同作家活动的数据,这些数据包括他们写作生涯的{{ asset('assets/css/main.css') }}start.date

end.date

我想最终创建一个这个数据的joyplot,这需要我生成这个数据结构:

library("tidyverse")
writing_period_data <- tribble(
  ~start.date, ~end.date, ~writer, ~topic,
  12, 18, "a", sample(letters[10:20],1),
  14, 20, "b", sample(letters[10:20],1),
  17, 22, "c", sample(letters[10:20],1),
  15, 30, "a", sample(letters[10:20],1)
)

我们从此图表中可以看到,在所关注的时间段内,作者的分布情况如下:

desired_output <- tribble(
  ~year, ~count, ~writer,
  12, 1, "a",
  13, 1, "a",
  14, 1, "a",
  14, 1, "b",
  15, 2, "a",
  15, 1, "b",
  16, 2, "a",
  16, 1, "b",
  17, 2, "a",
  17, 1, "b",
  17, 1, "c",
  18, 2, "a",
  18, 1, "b",
  18, 1, "c",
  19, 1, "a",
  19, 1, "b",
  19, 1, "c",
  20, 1, "a",
  20, 1, "b",
  20, 1, "c",
  21, 1, "a",
  21, 1, "c",
  22, 1, "a",
  22, 1, "c",
  23, 1, "a",
  24, 1, "a"
)

enter image description here

如何从desired_output %>% ggplot(aes(x = year, y = count, fill = writer)) + geom_col() 生成desired_output

1 个答案:

答案 0 :(得分:2)

来自tidyverse的解决方案。 dt是最终输出。

library(tidyverse)

dt <- writing_period_data %>%
  mutate(year = map2(start.date, end.date, `:`)) %>%
  unnest() %>%
  count(year, writer) %>%
  select(year, count = n, writer)