R dplyr:基于行的条件拆分/应用/合并

时间:2018-10-28 19:56:14

标签: r dplyr

这是this question的dplyr版本

我有以下data.table

initial.date <- as.POSIXct('2018-10-27 10:00:00',tz='GMT')
last.date <- as.POSIXct('2018-10-28 17:00:00',tz='GMT') 
    PriorityDateTime=seq.POSIXt(from=initial.date,to = last.date,by = '30 sec')
    TradePrice=seq(from=1, to=length(PriorityDateTime),by = 1)
    ndf<- data.frame(PriorityDateTime,TradePrice)
    ndf$InstrumentSymbol <- rep_len(x = c('asset1','asset2'),length.out = length(ndf$PriorityDateTime))
    ndf$id <- seq(1:length(x = ndf$InstrumentSymbol))
    ndf$datetime <- ymd_hms(ndf$PriorityDateTime)
    res <- ndf %>% data.table()

看起来像这样:

    > res
         PriorityDateTime TradePrice InstrumentSymbol   id            datetime
   1: 2018-10-27 10:00:00          1           asset1    1 2018-10-27 10:00:00
   2: 2018-10-27 10:00:30          2           asset2    2 2018-10-27 10:00:30
   3: 2018-10-27 10:01:00          3           asset1    3 2018-10-27 10:01:00
   4: 2018-10-27 10:01:30          4           asset2    4 2018-10-27 10:01:30
   5: 2018-10-27 10:02:00          5           asset1    5 2018-10-27 10:02:00

使用dplyr是最优雅,最快的方法:

  1. 分割:对于每行,定义在过去或将来最多datetime最多60秒(时差小于60秒)且{{ 1}}。
  2. 应用:在这些接近的行中,最接近该行InstrumentSymbol的{​​{1}}:在原始TradePrice中获得TradePrice[i]和{{1} }的另一行
  3. 合并:将结果重新组合为原始index中的新列,例如重新组合为data.frameTradePrice中的新列

示例结果:

data.table

我猜我的问题可以被问为“如何以与index.minpricewithin60类似的方式在minpricewithin60中修复行 我有使用> res PriorityDateTime TradePrice InstrumentSymbol id datetime minpricewithin60 index.minpricewithin60 1: 2018-10-27 10:00:00 1 asset1 1 2018-10-27 10:00:00 2 2 2: 2018-10-27 10:00:30 2 asset2 2 2018-10-27 10:00:30 4 4 3: 2018-10-27 10:01:00 3 asset1 3 2018-10-27 10:01:00 1 1 4: 2018-10-27 10:01:30 4 asset2 4 2018-10-27 10:01:30 2 2 5: 2018-10-27 10:02:00 5 asset1 5 2018-10-27 10:02:00 3 3 的潜在解决方案,但到目前为止一切都非常缓慢。

1 个答案:

答案 0 :(得分:1)

使用dplyr软件包和lapply函数的解决方案:

result_df <- do.call(rbind, lapply(1:nrow(res), function(row_id) {

             temp <-   res %>% filter(InstrumentSymbol == res$InstrumentSymbol[row_id]) %>% 
                       mutate(time_diff = abs(difftime(res$datetime[row_id], datetime, units = "secs")),
                              diff_price = abs(TradePrice - res$TradePrice[row_id])) %>% 
                       filter(id != res$id[row_id], time_diff <= 60) %>% 
                       filter(diff_price == min(diff_price)) %>% select(TradePrice, id) %>% 
                       rename(minpricewithin60 = TradePrice, index.minpricewithin60 = id)

             if(nrow(temp) == 0) temp[1,] <- c(NA, NA)

             return(bind_cols(res %>% slice(rep(row_id, nrow(temp))), temp))
                                                                  }))

head(result_df)

     PriorityDateTime TradePrice InstrumentSymbol id            datetime minpricewithin60 index.minpricewithin60
1 2018-10-27 10:00:00          1           asset1  1 2018-10-27 10:00:00                3                      3
2 2018-10-27 10:00:30          2           asset2  2 2018-10-27 10:00:30                4                      4
3 2018-10-27 10:01:00          3           asset1  3 2018-10-27 10:01:00                1                      1
4 2018-10-27 10:01:00          3           asset1  3 2018-10-27 10:01:00                5                      5
5 2018-10-27 10:01:30          4           asset2  4 2018-10-27 10:01:30                2                      2
6 2018-10-27 10:01:30          4           asset2  4 2018-10-27 10:01:30                6                      6