Question

根据我之前问过的一个问题（Count with conditions in R dataframe），我有下表：

  Week   SKU   Discount(%)   Duration  LastDiscount
     1     111       5            2           0
     2     111       5            2           0
     3     111       0            0           0
     4     111      10            2           0
     5     111      11            2           2
     1     222       0            0           0
     2     222      10            3           0
     3     222      15            3           0
     4     222      20            3           0

我希望LastDiscount计数位于第一行，在同一行中，同一周的SKU在不同星期有不同的折扣。例如，SKU 111在第二个星期有一个折扣，下一个折扣在第4个星期，这距离上次折扣有2周，但问题是我希望结果在第一个星期开始的第4个星期折扣活动。

类似这样的东西：

  Week   SKU   Discount(%)   Duration  LastDiscount
     1     111       5            2           0
     2     111       5            2           0
     3     111       0            0           0
     4     111      10            2           2
     5     111      11            2           0
     1     222       0            0           0
     2     222      10            3           0
     3     222      15            3           0
     4     222      20            3           0

我现在有以下代码：

df1 %>%
  group_by(SKU) %>% 
  mutate(Duration = with(rle(Discount > 0), rep(lengths*values, 
        lengths)),
         temp = with(rle(Discount > 0), sum(values != 0)), 
         LastDiscount = if(temp[1] > 1) c(rep(0, n()-1), temp[1]) else 0) %>%
  select(-temp)

Answer 1

这里是使用data.table的选项。如果OP仅在寻找dplyr解决方案，我将删除它：

#calculate duration of discount and also the start and end of discount period
DT[, c("Duration", "disc_seq") := {
        dur <- sum(`Discount(%)` > 0L)
        disc_seq <- rep("", .N)
        if (dur > 0) {
            disc_seq[1L] <- "S"
            disc_seq[length(disc_seq)] <- "E"
        }
        .(dur, disc_seq)
    }, 
    .(SKU, rleid(`Discount(%)` > 0L))]
DT[]

#use a non-equi join to find the end of previous discount period to update LastDiscount column of the start of current discount period
DT[, LastDiscount := 0L]
DT[disc_seq=="S", LastDiscount := {
        ld <- DT[disc_seq=="E"][.SD, on=.(SKU, Week<Week), by=.EACHI, i.Week - x.Week]$V1
        replace(ld, is.na(ld), 0L)
    }]
DT[]

输出：

   Week SKU Discount(%) Duration disc_seq LastDiscount
1:    1 111           5        2        S            0
2:    2 111           5        2        E            0
3:    3 111           0        0                     0
4:    4 111          10        2        S            2
5:    5 111          11        2        E            0
6:    1 222           0        0                     0
7:    2 222          10        3        S            0
8:    3 222          15        3                     0
9:    4 222          20        3        E            0

数据：

library(data.table)
DT <- fread("Week   SKU   Discount(%)
1     111       5
2     111       5
3     111       0
4     111      10
5     111      11
1     222       0
2     222      10
3     222      15
4     222      20")

Answer 2

LastDiscount是否总是比它应该位于的位置低一行？如果是这样，您可以这样做：

library(dplyr)
df %>% 
  mutate(LastDiscount2=lead(LastDiscount))

条件R的计数/重复

2 个答案: