下面的数据框显示产品的促销日历,即促销开始的星期几,促销的产品以及促销的时间。
我需要一个函数来创建一个标志(通过PromoID和StartWk),以指示是否重复了Product-WeekNum组合,其中WeekNum是(StartWk)到(StartWk +持续时间)。因此,第一行的WeekNum是第5周和第6周(依此类推)。基本上,如果复制了任何Product-WeekNum组合,则将标记相应的PromoID-StartWk组合。 WeekNum显示为R条评论。
如果没有这样的实例,则函数应该输出带有输出字段的空数据框。
非常希望-传递给该函数的空数据框应产生一个带有输出字段的空数据框。
如果有帮助,任何给定的PromoID在所有情况下都将始终具有相同的产品集和相同的持续时间。
df <- structure(list(PromoID = c("A", "A", "A", "A", "B", "B", "C",
"C", "D", "A", "A", "E", "E"), Product = c("Flavored", "Original",
"Flavored", "Original", "Flavored", "Original", "Flavored", "Original",
"Flavored", "Flavored", "Original", "Energy", "Energy"), StartWk = c(5L,
5L, 21L, 21L, 30L, 30L, 6L, 6L, 5L, 5L, 5L, 49L, 49L), Duration = c(2L,
2L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 2L, 1L, 1L)), .Names = c("PromoID",
"Product", "StartWk", "Duration"), class = "data.frame", row.names = c(NA,
-13L))
PromoID Product StartWk Duration
1 A Flavored 5 2 # WeekNum 5, 6
2 A Original 5 2 # WeekNum 5, 6
3 A Flavored 21 2 # WeekNum 21, 22
4 A Original 21 2 # WeekNum 21, 22
5 B Flavored 30 3 # WeekNum 30, 31, 32
6 B Original 30 3 # WeekNum 30, 31, 32
7 C Flavored 6 1 # WeekNum 6
8 C Original 6 1 # WeekNum 6
9 D Flavored 5 2 # WeekNum 5, 6
10 A Flavored 5 2 # WeekNum 5, 6
11 A Original 5 2 # WeekNum 5, 6
12 E Energy 49 1 # WeekNum 49
13 E Energy 49 1 # WeekNum 49
预期输出-
PromoID StartWk Flag
1 A 5 1
2 C 6 1
3 D 5 1
4 E 49 1
答案 0 :(得分:2)
df %>%
# Make row for each week of each promotion
tidyr::uncount(weights = Duration, .id = "wk_no") %>%
# Show what week is represented by each row
mutate(CurWk = StartWk + wk_no - 1) %>%
# How many Promos are there for each product each week?
add_count(Product, CurWk) %>%
# Only include overlapping promos
filter(n > 1) %>%
# To shape into requested output form, only show one row per overlap
group_by(PromoID, StartWk) %>%
summarize(Flag = 1)
输出
# A tibble: 4 x 3
# Groups: PromoID [?]
PromoID StartWk Flag
<chr> <int> <dbl>
1 A 5 1
2 C 6 1
3 D 5 1
4 E 49 1