我正在尝试排列我的数据集并在我的数据集中创建一个新列,该列确定基于2个独立列的事件之间的顺序时间。
我有以下代码应该可以帮助我到达那里,但是在排除故障时遇到了困难。有没有人遇到过这个问题,或者可以用我的代码识别出这个问题?
我要使用的内容可以在下面找到:
可以在下面找到样本数据集:
UNITNUMBER <- c(1,1,1,1,2,2,3,3,3,4,4,4,4,4)
ORDERID <- c(5555,5558,5565,5278,5283,3287,3004,4678,2345,2189,1784,5743,4623,4541)
BREAKDOWN <- c(0,1,0,1,1,1,1,0,0,0,0,1,1,0)
RO_OPENED <- as.Date(c('2016-11-18','2016-11-28','2016-9-15','2017-4-2','2016-12-22','2017-3-8','2016-4-25','2016-2-3','2017-6-7','2016-7-5','2016-4-9','2017-10-27','2017-4-20','2017-5-10'))
test = data.frame(UNITNUMBER,ORDERID,BREAKDOWN,RO_OPENED)
test <- test %>% data.table(key = c("UNITNUMBER","RO_OPENED"))
test <- test[, c("UNITNUMBER", "RO_OPENED",
"TDIFF", "UNIQUEGROUP") :=
list(UNITNUMBER, RO_OPENED,
seq(.N), .GRP),
by = list(ORDERID)][, numSeq := seq(min(RO_OPENED), max(RO_OPENED)),
by = list(UNIQUEGROUP)][, runningTotal := ifelse(RO_OPENED == numSeq,
seq(.N), 1L),
by = list(UNITNUMBER, UNIQUEGROUP)]
我收到的错误如下:
Error in seq.Date(min(RO_OPENED), max(RO_OPENED)) :
exactly two of 'to', 'by' and 'length.out' / 'along.with' must be specified
我希望结果将是2个新列,给我一个UNIQUEGROUP标识符,以及每个UNITNUMBER和ORDERID的BREAKDOWNS之间的时间差,如下所示:
UNIT OrderID BD Date TDIFF
1 5565 0 9/15/2016 NA
1 5555 0 11/18/2016 NA
1 5558 1 11/28/2016 0
1 5278 1 4/2/2017 125
2 5283 1 12/22/2016 0
2 3287 1 3/8/2017 76
3 4678 0 2/3/2016 NA
3 3004 1 4/25/2016 0
3 2345 0 6/7/2017 NA
4 1784 0 4/9/2016 NA
4 2189 0 7/5/2016 NA
4 4623 1 4/20/2017 0
4 4541 0 5/10/2017 NA
4 5743 1 10/27/2017 190
答案 0 :(得分:1)
这应该可以完成您的工作
library(dplyr)
test %>%
arrange(UNITNUMBER, RO_OPENED) %>%
group_by(UNITNUMBER, BREAKDOWN) %>%
mutate(TDIFF = coalesce(RO_OPENED - lag(RO_OPENED), 0),
TDIFF = ifelse(BREAKDOWN == 0, NA, TDIFF))
答案 1 :(得分:0)
这是data.table的方法:
library(data.table)
setDT(test)
setorder(test, UNITNUMBER, RO_OPENED)
test[BREAKDOWN == 1,
TDIFF := c(0, diff(RO_OPENED)),
by = UNITNUMBER]
test