我的数据基本上是一个包含产品,日期和客户ID的购买清单。可以创建样本数据,如下所示 -
custId=c('A','A','B','C','A','D','E','F','B','C','F')
ProductPurchase=c('Milk','Tea','Milk','Eggs','Coffee','sugar','Chicken','milk','Apple','sugar','eggs')
BuyDate=c('1-03-2014','4-05-2017','15-02-2015','23-04-2014','12-04-2017','23-5-2016','13-5-2012','5-05-2014','2-03-2017','03-03-2017','21-06-2017')
ExpiryDate=c('1-03-2017','4-05-2022','15-02-2017','12-05-2015','12-04-2022','12-7-2018','23-06-2015','15-06-2017','3-03-2020','2-05-2019','21-07-2019')
DummyD=data.frame(custId,ProductPurchase,BuyDate,ExpiryDate)
custId ProductPurchase BuyDate ExpiryDate
1 A Milk 1-03-2014 1-03-2017
2 A Tea 4-05-2017 4-05-2022
3 B Milk 15-02-2015 15-02-2017
4 C Eggs 23-04-2014 12-05-2015
5 A Coffee 12-04-2017 12-04-2022
6 D sugar 23-5-2016 12-7-2018
我希望检索购买牛奶并再次购买(任何产品)的客户,有效期为+ - 60天(可能在到期前60天或之后60天)
例如,对于下面的数据,输出应该看起来像
CustID BoughtWithin60Days ProductExpiry ProductBought Expiry Date BuyD
A yes Milk Coffee 1-03-2017 12-04-2017
B yes Milk Apple 15-02-2017 2-03-2017
F yes Milk Eggs 15-06-2017 21-06-2017
答案 0 :(得分:0)
这更像merge
个问题,而不是reshape
个问题。
以下是使用" data.table"。
的可能解决方案从清理开始。您需要适当的日期,并且需要确保您的" ProductPurchase"列可以用于合并。
library(data.table)
setDT(DummyD)
DummyD[, c("ProductPurchase", "BuyDate", "ExpiryDate") :=
list(tolower(ProductPurchase),
as.Date(BuyDate, format = "%d-%m-%Y"),
as.Date(ExpiryDate, format = "%d-%m-%Y"))][]
创建购买产品的那些行的子集" milk"。在到期后的+/- 60天内添加两列。
milk <- DummyD[ProductPurchase == "milk"][
, c("Min", "Max") := list(ExpiryDate - 60, ExpiryDate + 60)]
创建所有其他购买产品的子集。
others <- DummyD[ProductPurchase != "milk"]
合并&#34; custId&#34;上的两个子集。柱。然后,通过使用&#34; Min&#34;来检查第二个产品(BuyDate.y)的购买日期,添加一个指示栏,说明它是否在60天内购买。和&#34; Max&#34;早先计算的值。
out <- merge(milk, others, "custId")[, within60 := BuyDate.y - 60 > Min & BuyDate.y < Max][]
out
# custId ProductPurchase.x BuyDate.x ExpiryDate.x Min Max
# 1: A milk 2014-03-01 2017-03-01 2016-12-31 2017-04-30
# 2: A milk 2014-03-01 2017-03-01 2016-12-31 2017-04-30
# 3: B milk 2015-02-15 2017-02-15 2016-12-17 2017-04-16
# 4: F milk 2014-05-05 2017-06-15 2017-04-16 2017-08-14
# ProductPurchase.y BuyDate.y ExpiryDate.y within60
# 1: tea 2017-05-04 2022-05-04 FALSE
# 2: coffee 2017-04-12 2022-04-12 TRUE
# 3: apple 2017-03-02 2020-03-03 TRUE
# 4: eggs 2017-06-21 2019-07-21 TRUE
如果您只想返回&#34; TRUE&#34;值,然后你可以使用:
out[(within60)]