从Bloomberg下载的样本文件如下(a),(b)。生成的文件应为(c)。请帮助您使用一些R代码或Excel或VBA代码。 PS:如果两个时间戳相同,则应采用最高价格,然后优先考虑大小。
(a) TCS IN Equity
01-04-2015 09:00:00 BID 2515 1
01-04-2015 09:00:04 BID 2553.95 133
01-04-2015 09:00:04 BID 2553.95 168
01-04-2015 09:00:06 BID 2515 1
01-04-2015 09:00:14 BID 2520 5
01-04-2015 09:00:24 BID 2525 3
(b)TCS IN Equity
01-04-2015 09:00:00 ASK 2594 5
01-04-2015 09:00:04 ASK 2565 1
01-04-2015 09:00:05 ASK 2594 5
01-04-2015 09:00:14 ASK 2570 10
01-04-2015 09:05:28 ASK 2560 5
(c)
TCS IN Equity BID BID_SIZ OFR OFR_SIZ
01-04-2015 09:00:00 2515 1 2594 5
01-04-2015 09:00:04 2553.95 168 2565 1
01-04-2015 09:00:14 2520 5 2570 10
答案 0 :(得分:0)
数据强>
A <- structure(list(date = structure(c(1L, 1L, 1L, 1L, 1L, 1L),
.Label = "01-04-2015", class = "factor"),
time = structure(c(1L, 2L, 2L, 3L, 4L, 5L),
.Label = c("09:00:00",
"09:00:04", "09:00:06", "09:00:14",
"09:00:24"), class = "factor"),
type = structure(c(1L, 1L, 1L, 1L, 1L, 1L),
.Label = "BID", class = "factor"),
BID = c(2515, 2553.95, 2553.95, 2515, 2520, 2525),
BID_SIZ = c(1L, 133L, 168L, 1L, 5L, 3L)),
.Names = c("date", "time", "type", "BID", "BID_SIZ"),
class = "data.frame", row.names = c(NA, -6L))
B <- structure(list(date = structure(c(1L, 1L, 1L, 1L, 1L),
.Label = "01-04-2015", class = "factor"),
time = structure(1:5,
.Label = c("09:00:00", "09:00:04",
"09:00:05", "09:00:14", "09:05:28"),
class = "factor"),
type = structure(c(1L, 1L, 1L, 1L, 1L),
.Label = "ASK", class = "factor"),
OFR = c(2594L, 2565L, 2594L, 2570L, 2560L),
OFR_SIZ = c(5L, 1L, 5L, 10L, 5L)),
.Names = c("date", "time", "type", "OFR", "OFR_SIZ"),
class = "data.frame", row.names = c(NA, -5L))
<强>代码强>
library(dplyr)
inner_join(A, B, by = c("date", "time")) %>%
group_by(date, time) %>%
arrange(BID, BID_SIZ) %>%
summarise_each(funs(last)) %>%
select(-type.x, -type.y)
# Source: local data frame [3 x 6]
# Groups: date
#
# date time BID BID_SIZ OFR OFR_SIZ
# 1 01-04-2015 09:00:00 2515.00 1 2594 5
# 2 01-04-2015 09:00:04 2553.95 168 2565 1
# 3 01-04-2015 09:00:14 2520.00 5 2570 10
<强>解释强>
首先,您使用inner_join
将两个数据集连接在一起。然后,根据date/time
在每个BID/BID_SIZ
组合内进行排序,然后选择最后一行。