R中同一列中两个日期之间的天差

时间:2019-01-17 11:47:36

标签: r datetime

我有一个带有ID和日期列的数据框。我希望计算一个组的date1和下一个日期之间的天差。

我已经尝试过dplyr软件包,这似乎是错误的。

hist_trnx1 %>% group_by(card_id) %>% mutate(gap=round(c(NA,diff(purchase_date)), 1))

我想得到如下结果

   Card_ID         date                  Diff   
1. C_ID_4e6213e9bc 2017-06-25 15:33:07   NA
2: C_ID_4e6213e9bc 2017-07-15 12:10:45   20
3: C_ID_4e6213e9bc 2017-08-09 22:04:29   34 
4: C_ID_4e6213e9bB 2017-03-10 10:06:26   NA #( Because of group change) 
5: C_ID_4e6213e9bB 2017-04-10 01:14:19   30 
6: C_ID_4e6213e9bD 2018-02-24 08:45:05   NA #( Because of group change )
7: C_ID_4e6213e9bD 2018-03-23 08:45:05   29

数据

structure(list(card_id = c("C_ID_4e6213e9bc", "C_ID_4e6213e9bc", 
"C_ID_4e6213e9bc", "C_ID_4e6213e9bc", "C_ID_4e6213e9bc", "C_ID_4e6213e9bc"
), purchase_date = structure(c(1498404787, 1500120645, 1502316269, 
1504346786, 1489108459, 1519461905), tzone = "UTC", class = c("POSIXct", 
"POSIXt"))), .Names = c("card_id", "purchase_date"), class = c("data.table", 
"data.frame"), row.names = c(NA, -6L))

1 个答案:

答案 0 :(得分:1)

我不确定这是否是最漂亮的方法,有人可能会提供更干净的解决方案,但这应该可以工作(解决方案的一部分位于subtract value from previous row by group

首先,我导入您的数据:

df <- structure(list(card_id = c("C_ID_4e6213e9bc", "C_ID_4e6213e9bc", "C_ID_4e6213e9bB", "C_ID_4e6213e9B", 
                                  "C_ID_4e6213e9bD", "C_ID_4e6213e9bD" ), 
                      purchase_date = structure(c(1498404787, 1500120645, 1502316269, 1504346786, 1489108459, 1519461905), 
                                                tzone = "UTC", class = c("POSIXct", "POSIXt"))), 
                 .Names = c("card_id", "purchase_date"), class = c("data.table", "data.frame"), 
                 row.names = c(NA, -6L))

然后它在我运行时起作用:

df <- df %>%
  group_by(card_id) %>%
  arrange(purchase_date) %>%
  mutate(diff = purchase_date - lag(purchase_date, default = first(purchase_date))) %>%
  mutate(diff = round(diff/86400, digits = 2))

排列可让您确定要减去的是要减去的内容,然后 lag 函数可让您选择上一行,最后选择除法返回花费的天数。

希望对您有帮助=)