如何使用difftime计算前一行值的天数差异?

时间:2017-10-29 15:26:16

标签: r

下面的代码创建数据:

enter image description here

idCol <- c('1','1','1','2','2','3','3')
rowNumIdCol <- c('1','2','3','4','5','6','7')
stepCol <- c('step1')
step1Col <- c('30-12-2010:11.02', '31-12-2010:10.06', '01-01-2011:15.12','01-03-2017:09.00', '01-05-2017:09.00', '01-06-2017:09.00', '01-07-2017:09.00')
mydata <- data.frame(idCol , rowNumIdCol , step1Col)
colnames(mydata) <- c('id' , 'rowNumId' , 'step1')

我正在尝试使用以下方法计算连续行之间的天数差异:

library(dplyr)
library(lubridate)
mydata %>% 
  group_by(id) %>% 
  mutate(DaysSpent = as.numeric(difftime(dmy_hm(step1)[row_number], 
                                         dmy_hm(step1)[row_number()+1], units = 'days')))

但是返回错误:

Error in mutate_impl(.data, dots) : 
  Evaluation error: invalid subscript type 'closure'.

要按天数计算天数之间的累积差异,可以使用:

mydata %>% 
  group_by(id) %>% 
  mutate(DaysSpent = as.numeric(difftime(dmy_hm(step1), 
                                         dmy_hm(step1)[1], units = 'days')))

如何计算上一行之间的天差?

我认为我需要访问当前和上一行作为mutate的一部分?

更新:每个id的行数是可变的。

3 个答案:

答案 0 :(得分:2)

使用data.table可以使用shift

完成此操作
library(data.table)

setDT(mydata)[, DaysSpent := difftime(dmy_hm(step1), dmy_hm(shift(step1, type = "lag")), units = "days"), by = id]

#   id rowNumId            step1       DaysSpent
#1:  1        1 30-12-2010:11.02         NA days
#2:  1        2 31-12-2010:10.06  0.9611111 days
#3:  1        3 01-01-2011:15.12  1.2125000 days
#4:  2        4 01-03-2017:09.00         NA days
#5:  2        5 01-05-2017:09.00 61.0000000 days
#6:  3        6 01-06-2017:09.00         NA days
#7:  3        7 01-07-2017:09.00 30.0000000 days

答案 1 :(得分:2)

我认为使用lag()更适合此任务:

library(dplyr)
library(lubridate)
mydata %>% 
  group_by(id) %>% 
  mutate(
    DaysSpent = as.numeric(difftime(
      dmy_hm(step1), lag(dmy_hm(step1)), units = 'days'
    ))
  )

还可以考虑从一开始将列step1转换为POSIXct:

mydata %>% 
  group_by(id) %>% 
  mutate(
    step1 = dmy_hm(step1),
    DaysSpent = as.numeric(difftime(
      step1, lag(step1), units = 'days'
    ))
  )

答案 2 :(得分:1)

我不确定您要查找的结果是什么,但如果我在第一个()之后添加row_number

,我就不会收到错误

另外,如果重要,请投入arrange()

library(dplyr)
library(lubridate)
mydata %>% 
  group_by(id) %>% 
  # arrange(step1) %>%
  mutate(DaysSpent = as.numeric(
    difftime(dmy_hm(step1)[row_number()+1], ## this is where I added ()
             dmy_hm(step1)[row_number()], units = 'days')))