总结多周期时滞的变量

时间:2017-09-11 09:24:59

标签: r dplyr

我正在寻找一种方法将以下两个dplyr操作合并为一个,使我的操作更短,更优雅:

总体而言,如果条件counterx得到满足,我想在y之间创建一笔金额,然后将此金额分配给即将到来的4个季度(quarter),不包括当前季度。有些情况会导致重叠,需要总结。我不能使用dplyr的lag(),因为由于汇总功能,我在输出中没有所有后果。这就是为什么我不得不做一个“绕行”并将dplyr操作分成两部分。我现在正在寻找一种优雅的方法,在一次操作中完成所有操作并避免中间步骤。

#Rep example

compid <- c(replicate(10, "A"), replicate(10, "B"))
quarter <- c(11:20, 11:20)
x <- c(0,1,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0)
counter <- c(0,1,2,0,1,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0)

dat <- data.frame(compid, quarter, x, counter)


#First, I create the sum count

dat %>%
  group_by(compid, quarter) %>%
  filter(x == 1) %>%
  summarise(sumcount = sum(counter)) %>%
  ungroup() -> temp

#Then, I did not know how to opearte in dplyr. I want to eliminate this intermediary step.

temp1 <- temp
temp1$quarter <- temp1$quarter + 1

temp2 <- temp
temp2$quarter <- temp2$quarter + 2

temp3 <- temp
temp3$quarter <- temp3$quarter + 3

temp4 <- temp
temp4$quarter <- temp4$quarter + 4

temp <- rbind(temp1, temp2, temp3, temp4)

#Lastly, I went back to dplyr to consolidate and refine the data

temp %>%
  group_by(compid, quarter) %>%
  summarise(sumcount = sum(sumcount)) %>%
  right_join(dat, by = c("compid", "quarter")) %>%
  mutate(sumcount = ifelse(is.na(sumcount), 0, sumcount))

1 个答案:

答案 0 :(得分:1)

这是替换中间步骤的整齐方法:

dat %>%

  # this chunk is unchanged
  group_by(compid, quarter) %>%
  filter(x == 1) %>%
  summarise(sumcount = sum(counter)) %>%
  ungroup() %>%

  # this replaces the creation of temp datasets
  mutate(Q1 = quarter + 1,
         Q2 = quarter + 2,
         Q3 = quarter + 3,
         Q4 = quarter + 4) %>%
  select(-quarter) %>%
  tidyr::gather(key, quarter, -compid, -sumcount) %>%
  select(compid, quarter, sumcount) %>%

  # this chunk is unchanged
  group_by(compid, quarter) %>%
  summarise(sumcount = sum(sumcount)) %>%
  right_join(dat, by = c("compid", "quarter")) %>%
  mutate(sumcount = ifelse(is.na(sumcount), 0, sumcount))

# A tibble: 20 x 5
# Groups:   compid [2]
   compid quarter sumcount     x counter
   <fctr>   <dbl>    <dbl> <dbl>   <dbl>
 1      A      11        0     0       0
 2      A      12        0     1       1
 3      A      13        1     1       2
 4      A      14        3     0       0
 5      A      15        3     0       1
 6      A      16        3     0       0
 7      A      17        2     0       0
 8      A      18        0     0       0
 9      A      19        0     0       0
10      A      20        0     0       0
11      B      11        0     1       1
12      B      12        1     1       1
13      B      13        2     0       0
14      B      14        2     0       1
15      B      15        2     0       0
16      B      16        1     0       0
17      B      17        0     0       0
18      B      18        0     0       0
19      B      19        0     0       0
20      B      20        0     0       0

顺便说一句,我没有看到y的任何条件,只有x,但我认为这对于此处提出的问题并不重要。