这是this question的dplyr版本
我有以下data.table
initial.date <- as.POSIXct('2018-10-27 10:00:00',tz='GMT')
last.date <- as.POSIXct('2018-10-28 17:00:00',tz='GMT')
PriorityDateTime=seq.POSIXt(from=initial.date,to = last.date,by = '30 sec')
TradePrice=seq(from=1, to=length(PriorityDateTime),by = 1)
ndf<- data.frame(PriorityDateTime,TradePrice)
ndf$InstrumentSymbol <- rep_len(x = c('asset1','asset2'),length.out = length(ndf$PriorityDateTime))
ndf$id <- seq(1:length(x = ndf$InstrumentSymbol))
ndf$datetime <- ymd_hms(ndf$PriorityDateTime)
res <- ndf %>% data.table()
看起来像这样:
> res
PriorityDateTime TradePrice InstrumentSymbol id datetime
1: 2018-10-27 10:00:00 1 asset1 1 2018-10-27 10:00:00
2: 2018-10-27 10:00:30 2 asset2 2 2018-10-27 10:00:30
3: 2018-10-27 10:01:00 3 asset1 3 2018-10-27 10:01:00
4: 2018-10-27 10:01:30 4 asset2 4 2018-10-27 10:01:30
5: 2018-10-27 10:02:00 5 asset1 5 2018-10-27 10:02:00
使用dplyr
是最优雅,最快的方法:
datetime
最多60秒(时差小于60秒)且{{ 1}}。InstrumentSymbol
的{{1}}:在原始TradePrice
中获得TradePrice[i]
和{{1} }的另一行index
中的新列,例如重新组合为data.frame
和TradePrice
中的新列示例结果:
data.table
我猜我的问题可以被问为“如何以与index.minpricewithin60
类似的方式在minpricewithin60
中修复行
我有使用> res
PriorityDateTime TradePrice InstrumentSymbol id datetime minpricewithin60 index.minpricewithin60
1: 2018-10-27 10:00:00 1 asset1 1 2018-10-27 10:00:00 2 2
2: 2018-10-27 10:00:30 2 asset2 2 2018-10-27 10:00:30 4 4
3: 2018-10-27 10:01:00 3 asset1 3 2018-10-27 10:01:00 1 1
4: 2018-10-27 10:01:30 4 asset2 4 2018-10-27 10:01:30 2 2
5: 2018-10-27 10:02:00 5 asset1 5 2018-10-27 10:02:00 3 3
的潜在解决方案,但到目前为止一切都非常缓慢。
答案 0 :(得分:1)
使用dplyr
软件包和lapply
函数的解决方案:
result_df <- do.call(rbind, lapply(1:nrow(res), function(row_id) {
temp <- res %>% filter(InstrumentSymbol == res$InstrumentSymbol[row_id]) %>%
mutate(time_diff = abs(difftime(res$datetime[row_id], datetime, units = "secs")),
diff_price = abs(TradePrice - res$TradePrice[row_id])) %>%
filter(id != res$id[row_id], time_diff <= 60) %>%
filter(diff_price == min(diff_price)) %>% select(TradePrice, id) %>%
rename(minpricewithin60 = TradePrice, index.minpricewithin60 = id)
if(nrow(temp) == 0) temp[1,] <- c(NA, NA)
return(bind_cols(res %>% slice(rep(row_id, nrow(temp))), temp))
}))
head(result_df)
PriorityDateTime TradePrice InstrumentSymbol id datetime minpricewithin60 index.minpricewithin60
1 2018-10-27 10:00:00 1 asset1 1 2018-10-27 10:00:00 3 3
2 2018-10-27 10:00:30 2 asset2 2 2018-10-27 10:00:30 4 4
3 2018-10-27 10:01:00 3 asset1 3 2018-10-27 10:01:00 1 1
4 2018-10-27 10:01:00 3 asset1 3 2018-10-27 10:01:00 5 5
5 2018-10-27 10:01:30 4 asset2 4 2018-10-27 10:01:30 2 2
6 2018-10-27 10:01:30 4 asset2 4 2018-10-27 10:01:30 6 6