计算ID记录之间的时差 - 我应该使用滞后还是差异?

时间:2015-05-05 14:54:35

标签: r dplyr

我试图找到每个Consumer.Identity的记录之间的时间差。例如,每个Consumer.Identity都是独立的。只应为每个唯一ID计算访问次数之间的差异。

示例数据:

tail(cs1[,c('Other', 'Consumer.Identity', 'timestamp')], 20)

      Other                    Consumer.Identity           timestamp
8830       672 ff98e12f-24fc-4072-adba-fd15c9481a84 2014-05-29 19:15:00
8838       672 ff98e12f-24fc-4072-adba-fd15c9481a84 2014-05-29 19:45:00
8788       674 ff98e12f-24fc-4072-adba-fd15c9481a84 2014-05-30 13:26:00
12102      665 ff98e12f-24fc-4072-adba-fd15c9481a84 2014-06-06 18:29:00
11749      663 ff98e12f-24fc-4072-adba-fd15c9481a84 2014-06-09 08:15:00
11761      663 ff98e12f-24fc-4072-adba-fd15c9481a84 2014-06-09 08:48:00
11696      663 ff98e12f-24fc-4072-adba-fd15c9481a84 2014-06-09 14:12:00
11819      663 ff98e12f-24fc-4072-adba-fd15c9481a84 2014-06-10 08:23:00
11912      663 ff98e12f-24fc-4072-adba-fd15c9481a84 2014-06-10 16:13:00
13188      673 ff98e12f-24fc-4072-adba-fd15c9481a84 2014-06-13 18:24:00
14235      667 ff98e12f-24fc-4072-adba-fd15c9481a84 2014-06-16 15:24:00
14812      673 ff98e12f-24fc-4072-adba-fd15c9481a84 2014-06-18 16:03:00
20523      650 ff98e12f-24fc-4072-adba-fd15c9481a84 2014-06-26 10:27:00
17856      657 ffa6dab4-361a-4ef0-8e23-53cd6084d01e 2015-01-07 22:59:00
18051      657 ffa6dab4-361a-4ef0-8e23-53cd6084d01e 2015-01-08 08:53:00
25860      657 ffab2368-3b2e-4ee3-9352-5c6520cf81b1 2014-07-30 15:27:00
17163      673 ffab2368-3b2e-4ee3-9352-5c6520cf81b1 2015-01-06 18:21:00
53407      670 ffc3af0b-f3ee-4ca7-a1db-4a9a1f1cf58d 2014-09-15 17:41:00
76334      667 fff9593f-3038-4986-9792-0960fdd87a1b 2014-08-13 17:01:00
41457      667 fff9593f-3038-4986-9792-0960fdd87a1b 2014-08-18 16:48:00

以下是我的代码。我想创建一个名为gap的单独字段。我还对使用lag()还是diff()

感到困惑
cs1 %>%
  arrange(Consumer.Identity, timestamp) %>%
  group_by(Consumer.Identity) %>%
  mutate(gap = timestamp - lag(timestamp)) %>%
  group_by(Consumer.Identity) %>%
  mutate(gap = ifelse(row_number() == 1, NA, gap)) # first row of group is NA

0 个答案:

没有答案