问题陈述:
您将获得以下数据:
要查找:
预期输出(最后一列):
customer_id product purchase_time total_to_date time_from_last_purchase
1 A 2014-11-24 1 0
1 A 2018-02-21 2 1185
1 E 2014-01-08 1 0
2 J 2016-04-18 1 0
3 F 2017-06-12 1 0
3 G 2017-06-23 1 0
4 F 2017-09-27 1 0
4 F 2018-01-08 2 103
4 F 2018-02-08 3 31
4 F 2018-02-09 4 1
4 F 2018-04-10 5 60
我的方法:
我对R很陌生,因此非常感谢您的帮助。谢谢!
答案 0 :(得分:0)
使用dplyr
,您可以尝试:
df %>%
group_by(customer_id, product) %>%
mutate(purchase_time = as.Date(purchase_time, format = "%Y-%m-%d"),
res = purchase_time - lag(purchase_time, default = first(purchase_time)))
customer_id product purchase_time total_to_date res
<int> <chr> <date> <int> <time>
1 1 A 2014-11-24 1 0 days
2 1 A 2018-02-21 2 1185 days
3 1 E 2014-01-08 1 0 days
4 2 J 2016-04-18 1 0 days
5 3 F 2017-06-12 1 0 days
6 3 G 2017-06-23 1 0 days
7 4 F 2017-09-27 1 0 days
8 4 F 2018-01-08 2 103 days
9 4 F 2018-02-08 3 31 days
10 4 F 2018-02-09 4 1 days
11 4 F 2018-04-10 5 60 days
或者如果需要将结果用作数字变量:
df %>%
group_by(customer_id, product) %>%
mutate(purchase_time = as.Date(purchase_time, format = "%Y-%m-%d"),
res = as.numeric(purchase_time - lag(purchase_time, default = first(purchase_time))))
customer_id product purchase_time total_to_date res
<int> <chr> <date> <int> <dbl>
1 1 A 2014-11-24 1 0
2 1 A 2018-02-21 2 1185
3 1 E 2014-01-08 1 0
4 2 J 2016-04-18 1 0
5 3 F 2017-06-12 1 0
6 3 G 2017-06-23 1 0
7 4 F 2017-09-27 1 0
8 4 F 2018-01-08 2 103
9 4 F 2018-02-08 3 31
10 4 F 2018-02-09 4 1
11 4 F 2018-04-10 5 60
答案 1 :(得分:0)
使用diff
library(dplyr)
df %>%
mutate(purchase_time = as.Date(purchase_time)) %>%
group_by(customer_id, product) %>%
mutate(diff = c(0, diff(purchase_time)))
# customer_id product purchase_time total_to_date time_from_last_purchase diff
# <int> <fct> <date> <int> <int> <dbl>
# 1 1 A 2014-11-24 1 0 0
# 2 1 A 2018-02-21 2 1185 1185
# 3 1 E 2014-01-08 1 0 0
# 4 2 J 2016-04-18 1 0 0
# 5 3 F 2017-06-12 1 0 0
# 6 3 G 2017-06-23 1 0 0
# 7 4 F 2017-09-27 1 0 0
# 8 4 F 2018-01-08 2 103 103
# 9 4 F 2018-02-08 3 31 31
#10 4 F 2018-02-09 4 1 1
#11 4 F 2018-04-10 5 60 60
类似地,我们可以使用基数R ave
df$diff <- with(df, ave(as.numeric(as.Date(purchase_time)), customer_id, product,
FUN = function(x) c(0, diff(x))))
如果您的as.Date
已经属于purchase_time
类,则可以跳过两种方法中的date
部分。