[R]:标记重叠的时间段

时间:2018-10-03 19:17:17

标签: r

下面的数据框显示产品的促销日历,即促销开始的星期几,促销的产品以及促销的时间。

我需要一个函数来创建一个标志(通过PromoID和StartWk),以指示是否重复了Product-WeekNum组合,其中WeekNum是(StartWk)到(StartWk +持续时间)。因此,第一行的WeekNum是第5周和第6周(依此类推)。基本上,如果复制了任何Product-WeekNum组合,则将标记相应的PromoID-StartWk组合。 WeekNum显示为R条评论。

如果没有这样的实例,则函数应该输出带有输出字段的空数据框。

非常希望-传递给该函数的空数据框应产生一个带有输出字段的空数据框。

如果有帮助,任何给定的PromoID在所有情况下都将始终具有相同的产品集和相同的持续时间。

df <- structure(list(PromoID = c("A", "A", "A", "A", "B", "B", "C", 
"C", "D", "A", "A", "E", "E"), Product = c("Flavored", "Original", 
"Flavored", "Original", "Flavored", "Original", "Flavored", "Original", 
"Flavored", "Flavored", "Original", "Energy", "Energy"), StartWk = c(5L, 
5L, 21L, 21L, 30L, 30L, 6L, 6L, 5L, 5L, 5L, 49L, 49L), Duration = c(2L, 
2L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 2L, 1L, 1L)), .Names = c("PromoID", 
"Product", "StartWk", "Duration"), class = "data.frame", row.names = c(NA, 
-13L))

   PromoID  Product StartWk Duration
1        A Flavored       5        2 # WeekNum 5, 6
2        A Original       5        2 # WeekNum 5, 6
3        A Flavored      21        2 # WeekNum 21, 22
4        A Original      21        2 # WeekNum 21, 22
5        B Flavored      30        3 # WeekNum 30, 31, 32
6        B Original      30        3 # WeekNum 30, 31, 32
7        C Flavored       6        1 # WeekNum 6
8        C Original       6        1 # WeekNum 6
9        D Flavored       5        2 # WeekNum 5, 6
10       A Flavored       5        2 # WeekNum 5, 6
11       A Original       5        2 # WeekNum 5, 6
12       E   Energy      49        1 # WeekNum 49
13       E   Energy      49        1 # WeekNum 49

预期输出-

  PromoID StartWk Flag
1       A       5    1
2       C       6    1
3       D       5    1
4       E      49    1

1 个答案:

答案 0 :(得分:2)

df %>%
  # Make row for each week of each promotion
  tidyr::uncount(weights = Duration, .id = "wk_no") %>%
  # Show what week is represented by each row
  mutate(CurWk = StartWk + wk_no - 1) %>%
  # How many Promos are there for each product each week?
  add_count(Product, CurWk) %>%
  # Only include overlapping promos
  filter(n > 1) %>%

  # To shape into requested output form, only show one row per overlap
  group_by(PromoID, StartWk) %>%
  summarize(Flag = 1)

输出

# A tibble: 4 x 3
# Groups:   PromoID [?]
  PromoID StartWk  Flag
  <chr>     <int> <dbl>
1 A             5     1
2 C             6     1
3 D             5     1
4 E            49     1