假设我有以下包含买入和卖出数据的XTS对象:
Time Bid Ask
00:01 10 11
00:02 10 11
00:03 11 12
00:04 12 13
00:05 10 11
00:06 10 11
00:07 10 11
00:08 9 12
00:09 2 10
00:10 4 5
我想得到以下输出:
Time Bid Ask
00:01 10 11
00:03 11 12
00:04 12 13
00:05 10 11
00:08 9 12
00:09 2 10
00:10 4 5
只有与前一个条目相同的行才能删除。如果出现买入或卖出更改,则不会删除任何内容,因此只需取出重复项就行不通。
这应该是相当简单的,因为我之前已经这样做了,但我不记得怎么也找不到它。
更新: 我在初始数据和预期输出中添加了一些额外的条目。
约书亚的沮丧是辉煌的,但它取决于诸如rowSums之类的功能给出了不同的结果,但它打破了9 12。 我使用了matrixStats包中的rowProds函数,但是我的最后两行显然失败了。此外,我的示例第一列中的值小于第2列中的值。虽然这是有意义的,但它不是必须的,因此如果Ask小于Bid,其中rowSums和rowProds都将失败,则函数应该起作用。 有没有更好的行函数,如果有什么不同,总会给出不同的结果,例如行哈希?答案 0 :(得分:4)
You can do this by using rle()
on the sum of each row.
x <- structure(
c(10L, 10L, 11L, 12L, 10L, 10L, 10L, 11L, 11L, 12L, 13L, 11L, 11L, 11L),
.Dim = c(7L, 2L), .Dimnames = list(NULL, c("Bid", "Ask")),
index = structure(1:7, tzone = "", tclass = c("POSIXct", "POSIXt")),
.indexCLASS = c("POSIXct", "POSIXt"), .indexTZ = "",
tclass = c("POSIXct", "POSIXt"), tzone = "", class = c("xts", "zoo"))
r <- rle(rowSums(x))
If you want the last observation in each group, you can just use cumsum(r$lengths)
as the row index when subsetting.
R> x[cumsum(r$lengths),]
Bid Ask
1969-12-31 18:00:02 10 11
1969-12-31 18:00:03 11 12
1969-12-31 18:00:04 12 13
1969-12-31 18:00:07 10 11
Since you want the first observation of each group, you need to prepend the r$lengths
vector with a 1
(you always want the first observation) and then remove the last element of r$lengths
. Then call cumsum()
on the result.
R> x[cumsum(c(1, head(r$lengths, -1))),]
Bid Ask
1969-12-31 18:00:01 10 11
1969-12-31 18:00:03 11 12
1969-12-31 18:00:04 12 13
1969-12-31 18:00:05 10 11
Good catch on the limitation of rowSums()
. A robust solution is to diff()
the bids and asks and select the rows where either is not zero.
d <- diff(x) != 0 # rows with price changes
d[1,] <- TRUE # always select first observation
g <- cumsum(d$Bid | d$Ask) # groups of repeats
r <- rle(as.numeric(g)) # run length encoding on groups
# now use the solution above
x[cumsum(c(1, head(r$lengths, -1))),]