下面的代码创建数据:
idCol <- c('1','1','1','2','2','3','3')
rowNumIdCol <- c('1','2','3','4','5','6','7')
stepCol <- c('step1')
step1Col <- c('30-12-2010:11.02', '31-12-2010:10.06', '01-01-2011:15.12','01-03-2017:09.00', '01-05-2017:09.00', '01-06-2017:09.00', '01-07-2017:09.00')
mydata <- data.frame(idCol , rowNumIdCol , step1Col)
colnames(mydata) <- c('id' , 'rowNumId' , 'step1')
我正在尝试使用以下方法计算连续行之间的天数差异:
library(dplyr)
library(lubridate)
mydata %>%
group_by(id) %>%
mutate(DaysSpent = as.numeric(difftime(dmy_hm(step1)[row_number],
dmy_hm(step1)[row_number()+1], units = 'days')))
但是返回错误:
Error in mutate_impl(.data, dots) :
Evaluation error: invalid subscript type 'closure'.
要按天数计算天数之间的累积差异,可以使用:
mydata %>%
group_by(id) %>%
mutate(DaysSpent = as.numeric(difftime(dmy_hm(step1),
dmy_hm(step1)[1], units = 'days')))
如何计算上一行之间的天差?
我认为我需要访问当前和上一行作为mutate的一部分?
更新:每个id的行数是可变的。
答案 0 :(得分:2)
使用data.table
可以使用shift
:
library(data.table)
setDT(mydata)[, DaysSpent := difftime(dmy_hm(step1), dmy_hm(shift(step1, type = "lag")), units = "days"), by = id]
# id rowNumId step1 DaysSpent
#1: 1 1 30-12-2010:11.02 NA days
#2: 1 2 31-12-2010:10.06 0.9611111 days
#3: 1 3 01-01-2011:15.12 1.2125000 days
#4: 2 4 01-03-2017:09.00 NA days
#5: 2 5 01-05-2017:09.00 61.0000000 days
#6: 3 6 01-06-2017:09.00 NA days
#7: 3 7 01-07-2017:09.00 30.0000000 days
答案 1 :(得分:2)
我认为使用lag()
更适合此任务:
library(dplyr)
library(lubridate)
mydata %>%
group_by(id) %>%
mutate(
DaysSpent = as.numeric(difftime(
dmy_hm(step1), lag(dmy_hm(step1)), units = 'days'
))
)
还可以考虑从一开始将列step1
转换为POSIXct:
mydata %>%
group_by(id) %>%
mutate(
step1 = dmy_hm(step1),
DaysSpent = as.numeric(difftime(
step1, lag(step1), units = 'days'
))
)
答案 2 :(得分:1)
我不确定您要查找的结果是什么,但如果我在第一个()
之后添加row_number
另外,如果重要,请投入arrange()
library(dplyr)
library(lubridate)
mydata %>%
group_by(id) %>%
# arrange(step1) %>%
mutate(DaysSpent = as.numeric(
difftime(dmy_hm(step1)[row_number()+1], ## this is where I added ()
dmy_hm(step1)[row_number()], units = 'days')))