我是R的新手,想获得想要的输出
我的df就是这样,想要将第3阶段计数到第4阶段,将第3阶段计数到第5阶段,将第3阶段计数到第6阶段, 我只对从S3到不同阶段4,5,6感兴趣,如果开始阶段是S1或其他事情,那么我也不需要,如果在特定的一周内从S3到S3,我们不需要
并希望获得预期的结果
答案 0 :(得分:0)
这是一个整洁的选择:
library(dplyr)
library(tidyr)
set.seed(42)
dat <- tibble(
week = as.Date("2018-01-01") + cumsum(sample(c(0L, 7L), size = 10, replace = TRUE)),
begins = c(rep("S3", 9), "S1"),
ends = sample(c("S4", "S5", "S6"), size = 10, replace = TRUE),
stringsAsFactors = FALSE
)
dat
# # A tibble: 10 x 4
# week begins ends stringsAsFactors
# <date> <chr> <chr> <lgl>
# 1 2018-01-08 S3 S5 FALSE
# 2 2018-01-15 S3 S6 FALSE
# 3 2018-01-15 S3 S6 FALSE
# 4 2018-01-22 S3 S4 FALSE
# 5 2018-01-29 S3 S5 FALSE
# 6 2018-02-05 S3 S6 FALSE
# 7 2018-02-12 S3 S6 FALSE
# 8 2018-02-12 S3 S4 FALSE
# 9 2018-02-19 S3 S5 FALSE
# 10 2018-02-26 S1 S5 FALSE
dat %>%
filter(begins == "S3") %>%
group_by(week, ends) %>%
tally() %>%
group_by(week) %>%
pivot_wider(names_from = "ends", values_from = "n", values_fill = list(n = 0))
# # A tibble: 7 x 4
# # Groups: week [7]
# week S5 S6 S4
# <date> <int> <int> <int>
# 1 2018-01-08 1 0 0
# 2 2018-01-15 0 2 0
# 3 2018-01-22 0 0 1
# 4 2018-01-29 1 0 0
# 5 2018-02-05 0 1 0
# 6 2018-02-12 0 1 1
# 7 2018-02-19 1 0 0