我有一些看起来像
的数据 CustomerID InvoiceDate
<fctr> <dttm>
1 13313 2011-01-04 10:00:00
2 18097 2011-01-04 10:22:00
3 16656 2011-01-04 10:23:00
4 16875 2011-01-04 10:37:00
5 13094 2011-01-04 10:37:00
6 17315 2011-01-04 10:38:00
7 16255 2011-01-04 11:30:00
8 14606 2011-01-04 11:34:00
9 13319 2011-01-04 11:40:00
10 16282 2011-01-04 11:42:00
它告诉我一个人何时进行交易。我想知道每个客户的交易之间的时间,最好是几天。我这样做是按照以下方式进行的
d <- data %>%
arrange(CustomerID,InvoiceDate) %>%
group_by(CustomerID) %>%
mutate(delta.t = InvoiceDate - lag(InvoiceDate), #calculating the difference
delta.day = as.numeric(delta.t, unit = 'days')) %>%
na.omit() %>%
arrange(CustomerID) %>%
inner_join(Ntrans) %>% #Existing data.frame telling me the number of transactions per customer
filter(N>=10) %>% #only want people with more than 10 transactions
select(-N)
然而,结果没有意义(见下文)
CustomerID InvoiceDate delta.t delta.day
<fctr> <dttm> <time> <dbl>
1 12415 2011-01-10 09:58:00 5686 days 5686
2 12415 2011-02-15 09:52:00 51834 days 51834
3 12415 2011-03-03 10:59:00 23107 days 23107
4 12415 2011-04-01 14:28:00 41969 days 41969
5 12415 2011-05-17 15:42:00 66314 days 66314
6 12415 2011-05-20 14:13:00 4231 days 4231
7 12415 2011-06-15 13:37:00 37404 days 37404
8 12415 2011-07-13 15:30:00 40433 days 40433
9 12415 2011-07-13 15:31:00 1 days 1
10 12415 2011-07-19 10:51:00 8360 days 8360
以天计算的差异很远。我想要的是与客户ID分区的SQL滚动窗口功能相近的东西。我该如何实现呢?
答案 0 :(得分:0)
如果您只想将差异更改为天,则可以使用包lubridate。
Serializable