通过id R找出时差

时间:2015-11-27 15:07:06

标签: r

我有以下数据(df)

Id        Timestamp                 Event
 1    2015-11-06 06:11:43           mail subscribed
 1    2015-11-06 06:15:43           Invoice created
 1    2015-11-06 09:15:43           phone call
 2    2015-11-07 08:15:43           New subscription
 2    2015-11-07 08:20:43           Added to customer list.

我正在寻找以下内容,(每个id的时差)

例如,Id = 1有三个不同的事件,时间不同,我想根据Id计算事件之间各个时间之间的差异。

Id        Timestamp                 Event                   Time Difference(Mins)
 1    2015-11-06 06:11:43           mail subscribed           0.0
 1    2015-11-06 06:15:43           Invoice created           5.0         
 1    2015-11-06 09:15:43           phone call                180.0
 2    2015-11-07 08:15:43           New subscription          0.0
 2    2015-11-07 08:20:43           Added to customer list    5.0

我尝试了以下代码,

 diff = function(x) as.numeric(x - lag(x) )
 or diff = function (x) as.numeric(0L,diff(x))
 setDT(df)[, diff2 := diff(timestamp), by = Id]

但是这段代码会输出不规则的结果。有什么帮助吗?

2 个答案:

答案 0 :(得分:4)

试试ave。没有包使用。

transform(df, Diff = ave(as.numeric(Timestamp), Id, FUN = function(x) c(0, diff(x))/60))

,并提供:

  Id           Timestamp                            Event Diff
1  1 2015-11-06 06:11:43                  mail subscribed    0
2  1 2015-11-06 06:15:43                  Invoice created    4
3  1 2015-11-06 09:15:43                       phone call  180
4  2 2015-11-07 08:15:43                 New subscription    0
5  2 2015-11-07 08:20:43           Added to customer list    5

注意:这用于输入data.frame,DF

Lines <- "Id,        Timestamp,                 Event
 1,    2015-11-06 06:11:43,           mail subscribed
 1,    2015-11-06 06:15:43,          Invoice created
 1,    2015-11-06 09:15:43,          phone call
 2,    2015-11-07 08:15:43,          New subscription
 2,    2015-11-07 08:20:43,          Added to customer list"

df <- read.csv(text = Lines)
df$Timestamp <- as.POSIXct(df$Timestamp)
根据评论

更新

答案 1 :(得分:4)

您可以使用包data.table

执行此操作
library(data.table)
setDT(df)[, Diff := difftime(Timestamp, Timestamp[1], units="mins"), by=Id]

df
#   Id           Timestamp                   Event     Diff
#1:  1 2015-11-06 06:11:43         mail subscribed   0 mins
#2:  1 2015-11-06 06:15:43         Invoice created   4 mins
#3:  1 2015-11-06 09:15:43              phone call 184 mins
#4:  2 2015-11-07 08:15:43        New subscription   0 mins
#5:  2 2015-11-07 08:20:43 Added to customer list.   5 mins

修改

根据@Jaap评论,如果你需要的是连续的差异,你可以这样做:

df[, Diff2 := difftime(Timestamp, shift(Timestamp, 1L), units = "mins"), by = Id
   ][is.na(Diff2), Diff2:=0]

df
#   Id           Timestamp                   Event     Diff    Diff2
#1:  1 2015-11-06 06:11:43         mail subscribed   0 mins   0 mins
#2:  1 2015-11-06 06:15:43         Invoice created   4 mins   4 mins
#3:  1 2015-11-06 09:15:43              phone call 184 mins 180 mins
#4:  2 2015-11-07 08:15:43        New subscription   0 mins   0 mins
#5:  2 2015-11-07 08:20:43 Added to customer list.   5 mins   5 mins