在时间范围内获取计数

时间:2017-05-31 06:34:35

标签: sql r plyr reshape

我的数据基本上是一个包含产品,日期和客户ID的购买清单。可以创建样本数据,如下所示 -

custId=c('A','A','B','C','A','D','E','F','B','C','F')
ProductPurchase=c('Milk','Tea','Milk','Eggs','Coffee','sugar','Chicken','milk','Apple','sugar','eggs')
BuyDate=c('1-03-2014','4-05-2017','15-02-2015','23-04-2014','12-04-2017','23-5-2016','13-5-2012','5-05-2014','2-03-2017','03-03-2017','21-06-2017')
ExpiryDate=c('1-03-2017','4-05-2022','15-02-2017','12-05-2015','12-04-2022','12-7-2018','23-06-2015','15-06-2017','3-03-2020','2-05-2019','21-07-2019')
DummyD=data.frame(custId,ProductPurchase,BuyDate,ExpiryDate)

数据输出

  custId ProductPurchase    BuyDate ExpiryDate
1      A            Milk  1-03-2014  1-03-2017
2      A             Tea  4-05-2017  4-05-2022
3      B            Milk 15-02-2015 15-02-2017
4      C            Eggs 23-04-2014 12-05-2015
5      A          Coffee 12-04-2017 12-04-2022
6      D           sugar  23-5-2016  12-7-2018

我希望检索购买牛奶并再次购买(任何产品)的客户,有效期为+ - 60天(可能在到期前60天或之后60天)

例如,对于下面的数据,输出应该看起来像

CustID   BoughtWithin60Days   ProductExpiry  ProductBought  Expiry Date     BuyD
A           yes                     Milk        Coffee      1-03-2017      12-04-2017
B           yes                     Milk         Apple      15-02-2017     2-03-2017
F           yes                     Milk        Eggs        15-06-2017     21-06-2017

1 个答案:

答案 0 :(得分:0)

这更像merge个问题,而不是reshape个问题。

以下是使用" data.table"。

的可能解决方案

从清理开始。您需要适当的日期,并且需要确保您的" ProductPurchase"列可以用于合并。

library(data.table)
setDT(DummyD)
DummyD[, c("ProductPurchase", "BuyDate", "ExpiryDate") := 
         list(tolower(ProductPurchase),
              as.Date(BuyDate, format = "%d-%m-%Y"),
              as.Date(ExpiryDate, format = "%d-%m-%Y"))][]

创建购买产品的那些行的子集" milk"。在到期后的+/- 60天内添加两列。

milk <- DummyD[ProductPurchase == "milk"][
  , c("Min", "Max") := list(ExpiryDate - 60, ExpiryDate + 60)]

创建所有其他购买产品的子集。

others <- DummyD[ProductPurchase != "milk"]

合并&#34; custId&#34;上的两个子集。柱。然后,通过使用&#34; Min&#34;来检查第二个产品(BuyDate.y)的购买日期,添加一个指示栏,说明它是否在60天内购买。和&#34; Max&#34;早先计算的值。

out <- merge(milk, others, "custId")[, within60 := BuyDate.y - 60 > Min & BuyDate.y < Max][]
out
#    custId ProductPurchase.x  BuyDate.x ExpiryDate.x        Min        Max
# 1:      A              milk 2014-03-01   2017-03-01 2016-12-31 2017-04-30
# 2:      A              milk 2014-03-01   2017-03-01 2016-12-31 2017-04-30
# 3:      B              milk 2015-02-15   2017-02-15 2016-12-17 2017-04-16
# 4:      F              milk 2014-05-05   2017-06-15 2017-04-16 2017-08-14
#    ProductPurchase.y  BuyDate.y ExpiryDate.y within60
# 1:               tea 2017-05-04   2022-05-04    FALSE
# 2:            coffee 2017-04-12   2022-04-12     TRUE
# 3:             apple 2017-03-02   2020-03-03     TRUE
# 4:              eggs 2017-06-21   2019-07-21     TRUE

如果您只想返回&#34; TRUE&#34;值,然后你可以使用:

out[(within60)]