我正在尝试从作为销售虚拟变量的数据列中查找产品销售期的开始日期和结束日期。这是我正在使用的数据类型的代理:
我正在寻找的结果是:
我正在处理的实际数据集远不止于此,并且不一定仅查看2010-01至2011-12。
谢谢!
答案 0 :(得分:0)
假设每种产品只销售一次
require(tidyverse)
df <- data.frame(product = 'Product A',
month = seq(as.Date('2010-01-01'),
as.Date('2010-10-01'),
by = 'month'
),
onSale = c(rep(0,3), rep(1,4),rep(0,3))
)
df %>%
group_by(product) %>%
summarise(saleStart = month[which.min(month[onSale == 1])],
salend = month[which.max(month[onSale == 1])]
)
编辑:
df <- data.frame(product = 'Product A',
month = seq(as.Date('2010-01-01'),
as.Date('2011-09-01'),
by = 'month'
),
onSale = c(rep(0,3), rep(1,4),rep(0,3), rep(1,4),rep(0,3), rep(1,4))
)
df %>%
group_by(product) %>%
mutate(diff = c(0,diff(onSale))) %>%
group_by(product, diff) %>%
filter(diff == 1) %>%
mutate(monthStart = month, monthEnd = month %m+% months(1)) %>%
select(-month,-diff)