按ID计算时差

时间:2015-05-21 16:04:00

标签: r datediff

我有这样的数据:

Incident.ID.. = c(rep("INCFI0000029582",4), rep("INCFI0000029587",4))
date = c("2014-09-25 08:39:45", "2014-09-25 08:39:48", "2014-09-25 08:40:44", "2014-10-10 23:04:00", "2014-09-25 08:33:32", "2014-09-25 08:34:41", "2014-09-25 08:35:24", "2014-10-10 23:04:00")
df = data.frame(Incident.ID..,date, stringsAsFactors = FALSE)

df

   Incident.ID..                date
1  INCFI0000029582 2014-09-25 08:39:45
2  INCFI0000029582 2014-09-25 08:39:48
3  INCFI0000029582 2014-09-25 08:40:44
4  INCFI0000029582 2014-10-10 23:04:00
5  INCFI0000029587 2014-09-25 08:33:32
6  INCFI0000029587 2014-09-25 08:34:41
7  INCFI0000029587 2014-09-25 08:35:24
8  INCFI0000029587 2014-10-10 23:04:00

我使用此函数以秒为单位计算时差:

padded.diff = function(x) c(0L, diff(x)) 

df2=within(df, {
  date        = strptime(date, format="%Y-%m-%d %H:%M:%S")
  date.diff   = padded.diff(as.numeric(date)) 
})

df2

Incident.ID..      date                date.diff
1  INCFI0000029582 2014-09-25 08:39:45         0
2  INCFI0000029582 2014-09-25 08:39:48         3
3  INCFI0000029582 2014-09-25 08:40:44        56
4  INCFI0000029582 2014-10-10 23:04:00   1347796
5  INCFI0000029587 2014-09-25 08:33:32  -1348228
6  INCFI0000029587 2014-09-25 08:34:41        69
7  INCFI0000029587 2014-09-25 08:35:24        43
8  INCFI0000029587 2014-10-10 23:04:00   1348116

但我如何计算差异,以便每个“Incident.ID ..”从零开始?:

 Incident.ID..                date date.diff
1  INCFI0000029582 2014-09-25 08:39:45         0
2  INCFI0000029582 2014-09-25 08:39:48         3
3  INCFI0000029582 2014-09-25 08:40:44        56
4  INCFI0000029582 2014-10-10 23:04:00   1347796
5  INCFI0000029587 2014-09-25 08:33:32         0
6  INCFI0000029587 2014-09-25 08:34:41        69
7  INCFI0000029587 2014-09-25 08:35:24        43
8  INCFI0000029587 2014-10-10 23:04:00   1348116

2 个答案:

答案 0 :(得分:5)

使用基础R你可以简单地将其包装在ave

ave(as.numeric(as.POSIXct(date)), Incident.ID.., FUN = padded.diff) 

或使用data.table(根据@akruns评论)

library(data.table) 
setDT(df)[, date.diff := padded.diff(as.POSIXct(date)), by = Incident.ID..]

答案 1 :(得分:4)

以下是使用dplyrlubridate

的示例
library(dplyr)
library(lubridate)
df %>%
    group_by(Incident.ID..) %>%
    mutate(diff = c(0, diff(ymd_hms(date))))

Source: local data frame [8 x 3]
Groups: Incident.ID..

    Incident.ID..                date    diff
1 INCFI0000029582 2014-09-25 08:39:45       0
2 INCFI0000029582 2014-09-25 08:39:48       3
3 INCFI0000029582 2014-09-25 08:40:44      56
4 INCFI0000029582 2014-10-10 23:04:00 1347796
5 INCFI0000029587 2014-09-25 08:33:32       0
6 INCFI0000029587 2014-09-25 08:34:41      69
7 INCFI0000029587 2014-09-25 08:35:24      43
8 INCFI0000029587 2014-10-10 23:04:00 1348116