计算同一商品购买中最后一次购买的时间

时间:2019-04-25 05:36:12

标签: r dataframe dplyr lag

问题陈述:

您将获得以下数据:

  • customer_id的列表
  • 产品列表
  • 购买时间
  • 迄今为止购买同一商品的总数

要查找:

  • 同一商品中的Time_from_last_purchase

预期输出(最后一列):

customer_id product purchase_time  total_to_date  time_from_last_purchase
1              A      2014-11-24         1            0
1              A      2018-02-21         2            1185
1              E      2014-01-08         1            0
2              J      2016-04-18         1            0
3              F      2017-06-12         1            0 
3              G      2017-06-23         1            0 
4              F      2017-09-27         1            0
4              F      2018-01-08         2            103
4              F      2018-02-08         3            31
4              F      2018-02-09         4            1 
4              F      2018-04-10         5            60

我的方法:

  • 如果我手动执行此操作,则任何客户都是首次购买特定产品,那么time_from_last_purchase为0。
  • 任何客户第二次购买产品,那么time_from_last_purchase将等于当前购买的time_purchase-上一次购买的time_purchase

我对R很陌生,因此非常感谢您的帮助。谢谢!

2 个答案:

答案 0 :(得分:0)

使用dplyr,您可以尝试:

df %>%
 group_by(customer_id, product) %>%
 mutate(purchase_time = as.Date(purchase_time, format = "%Y-%m-%d"),
        res = purchase_time - lag(purchase_time, default = first(purchase_time)))

   customer_id product purchase_time total_to_date res      
         <int> <chr>   <date>                <int> <time>   
 1           1 A       2014-11-24                1    0 days
 2           1 A       2018-02-21                2 1185 days
 3           1 E       2014-01-08                1    0 days
 4           2 J       2016-04-18                1    0 days
 5           3 F       2017-06-12                1    0 days
 6           3 G       2017-06-23                1    0 days
 7           4 F       2017-09-27                1    0 days
 8           4 F       2018-01-08                2  103 days
 9           4 F       2018-02-08                3   31 days
10           4 F       2018-02-09                4    1 days
11           4 F       2018-04-10                5   60 days

或者如果需要将结果用作数字变量:

df %>%
 group_by(customer_id, product) %>%
 mutate(purchase_time = as.Date(purchase_time, format = "%Y-%m-%d"),
        res = as.numeric(purchase_time - lag(purchase_time, default = first(purchase_time))))

   customer_id product purchase_time total_to_date   res
         <int> <chr>   <date>                <int> <dbl>
 1           1 A       2014-11-24                1     0
 2           1 A       2018-02-21                2  1185
 3           1 E       2014-01-08                1     0
 4           2 J       2016-04-18                1     0
 5           3 F       2017-06-12                1     0
 6           3 G       2017-06-23                1     0
 7           4 F       2017-09-27                1     0
 8           4 F       2018-01-08                2   103
 9           4 F       2018-02-08                3    31
10           4 F       2018-02-09                4     1
11           4 F       2018-04-10                5    60

答案 1 :(得分:0)

使用diff

的另一种方法
library(dplyr)

df %>%
  mutate(purchase_time = as.Date(purchase_time)) %>%
  group_by(customer_id, product) %>%
  mutate(diff = c(0, diff(purchase_time)))


#  customer_id product purchase_time total_to_date time_from_last_purchase  diff
#         <int> <fct>   <date>                <int>                   <int> <dbl>
# 1           1 A       2014-11-24                1                       0     0
# 2           1 A       2018-02-21                2                    1185  1185
# 3           1 E       2014-01-08                1                       0     0
# 4           2 J       2016-04-18                1                       0     0
# 5           3 F       2017-06-12                1                       0     0
# 6           3 G       2017-06-23                1                       0     0
# 7           4 F       2017-09-27                1                       0     0
# 8           4 F       2018-01-08                2                     103   103
# 9           4 F       2018-02-08                3                      31    31
#10           4 F       2018-02-09                4                       1     1
#11           4 F       2018-04-10                5                      60    60

类似地,我们可以使用基数R ave

df$diff <- with(df, ave(as.numeric(as.Date(purchase_time)), customer_id, product, 
                    FUN = function(x) c(0, diff(x))))

如果您的as.Date已经属于purchase_time类,则可以跳过两种方法中的date部分。