我正在尝试创建根据条件变量确定事件之间时间的功能

时间:2019-10-27 19:47:44

标签: r dplyr data.table

我正在尝试排列我的数据集并在我的数据集中创建一个新列,该列确定基于2个独立列的事件之间的顺序时间。

我有以下代码应该可以帮助我到达那里,但是在排除故障时遇到了困难。有没有人遇到过这个问题,或者可以用我的代码识别出这个问题?

我要使用的内容可以在下面找到:

可以在下面找到样本数据集:

UNITNUMBER <- c(1,1,1,1,2,2,3,3,3,4,4,4,4,4)
ORDERID <- c(5555,5558,5565,5278,5283,3287,3004,4678,2345,2189,1784,5743,4623,4541)
BREAKDOWN <- c(0,1,0,1,1,1,1,0,0,0,0,1,1,0)
RO_OPENED <- as.Date(c('2016-11-18','2016-11-28','2016-9-15','2017-4-2','2016-12-22','2017-3-8','2016-4-25','2016-2-3','2017-6-7','2016-7-5','2016-4-9','2017-10-27','2017-4-20','2017-5-10'))

test = data.frame(UNITNUMBER,ORDERID,BREAKDOWN,RO_OPENED)

test <-  test %>% data.table(key = c("UNITNUMBER","RO_OPENED"))


test <-  test[, c("UNITNUMBER", "RO_OPENED",
                             "TDIFF", "UNIQUEGROUP") :=
                           list(UNITNUMBER, RO_OPENED,
                                seq(.N), .GRP),
                         by = list(ORDERID)][, numSeq := seq(min(RO_OPENED), max(RO_OPENED)),
                                             by = list(UNIQUEGROUP)][, runningTotal := ifelse(RO_OPENED == numSeq,
                                                                                        seq(.N), 1L), 
                                                               by = list(UNITNUMBER, UNIQUEGROUP)]

我收到的错误如下:

Error in seq.Date(min(RO_OPENED), max(RO_OPENED)) : 
  exactly two of 'to', 'by' and 'length.out' / 'along.with' must be specified

我希望结果将是2个新列,给我一个UNIQUEGROUP标识符,以及每个UNITNUMBER和ORDERID的BREAKDOWNS之间的时间差,如下所示:

UNIT OrderID BD    Date      TDIFF
1    5565    0    9/15/2016    NA
1    5555    0    11/18/2016   NA
1    5558    1    11/28/2016   0
1    5278    1    4/2/2017     125
2    5283    1    12/22/2016   0
2    3287    1    3/8/2017     76
3    4678    0    2/3/2016     NA
3    3004    1    4/25/2016    0
3    2345    0    6/7/2017     NA
4    1784    0    4/9/2016     NA
4    2189    0    7/5/2016     NA
4    4623    1    4/20/2017    0
4    4541    0    5/10/2017    NA
4    5743    1    10/27/2017   190

2 个答案:

答案 0 :(得分:1)

这应该可以完成您的工作

library(dplyr)
test %>% 
  arrange(UNITNUMBER, RO_OPENED) %>% 
  group_by(UNITNUMBER, BREAKDOWN) %>% 
  mutate(TDIFF = coalesce(RO_OPENED - lag(RO_OPENED), 0),
         TDIFF = ifelse(BREAKDOWN == 0, NA, TDIFF))

答案 1 :(得分:0)

这是的方法:

library(data.table)

setDT(test)
setorder(test, UNITNUMBER, RO_OPENED)

test[BREAKDOWN == 1,
     TDIFF := c(0, diff(RO_OPENED)),
     by = UNITNUMBER]

test