这是一个非常直截了当的问题。
我确实搜索了stackoverflow和google上的所有相关帖子但未能找到答案。参考Find which interval row in a data frame that each element of a vector belongs in和Split a vector into chunks in R
数据:
Time Price Volume Amount Flag
1: 2016-01-04 09:05:06 105.0 9500 993700 1
2: 2016-01-04 09:20:00 104.1 23500 2446350 0
3: 2016-01-04 09:30:00 104.1 18500 1924550 1
4: 2016-01-04 09:30:01 103.9 12500 1300550 0
5: 2016-01-04 09:30:02 104.1 16118 1675233 1
6: 2016-01-04 09:30:05 104.0 13000 1352200 0
7: 2016-01-04 09:30:06 104.1 2500 260100 1
8: 2016-01-04 09:30:07 104.1 1500 156150 1
9: 2016-01-04 09:30:08 104.3 500 52150 1
10: 2016-01-04 09:30:10 104.0 1000 104000 0
11: 2016-01-04 09:30:11 103.9 1000 103900 0
12: 2016-01-04 09:30:15 104.0 3500 364450 1
13: 2016-01-04 09:30:17 104.3 2000 208450 1
14: 2016-01-04 09:30:19 104.3 1500 156450 1
15: 2016-01-04 09:30:20 104.4 500 52200 1
16: 2016-01-04 09:30:21 104.4 1500 156600 1
17: 2016-01-04 09:30:22 104.4 1000 104400 1
18: 2016-01-04 09:30:24 104.4 1500 156600 1
19: 2016-01-04 09:30:25 104.0 2000 208000 0
20: 2016-01-04 09:30:27 104.1 3500 364350 1
与直方图或Hist
对象的准备工作类似,我想根据Volume
的不同级别构建Price
的分布。
具体来说:
Price
的范围分为N个/箱(Say,N = 5)Volume
我在split
包中尝试了cut_number
函数和其他几个函数,例如ggplot2
函数。我认为findInterval
可能会有所帮助,代码应该是这样的:
library(data.table)
dt[, sum(Volume), by = findInterval(Price,cut_number(Price, 5))] # Do not work
# I think the key should be in `by` part.
dt[, sum(Volume), by = some functions here]
可重复数据
dt <- data.table(structure(list(Time
= structure(c(1451898306, 1451899200,
1451899800,1451899801, 1451899802,
1451899805, 1451899806, 1451923195,
1451923196,1451923200), class =
c("POSIXct", "POSIXt"), tzone =
"GMT"),Price = c(105, 104.1,
104.1, 103.9, 104.1, 104, 104.1, 103,102.9, 102.9),
Volume = c(9500L, 23500L, 18500L,
12500L,16118L, 13000L, 2500L, 4000L, 2000L, 1000L),
Amount = c(993700L,2446350L,
1924550L, 1300550L, 1675233L, 1352200L, 260100L,412000L, 206016L, 102880L),
Flag = c(1L, 0L, 1L, 0L, 1L,0L,
1L, 1L, 0L, 1L)), .Names = c("Time",
"Price", "Volume","Amount",
"Flag"), class = c("data.table",
"data.frame"), row.names = c(NA,-10L)))
所需输出(仅供说明):
Price Range Sum
102.3 - 102.5 300000
.
. (Total N bins, thus N rows)
.
105.0 - 105.3 500000
我还尝试了其他几种组合,都失败了。
欢迎任何建议!非常感谢。
答案 0 :(得分:1)
假设N指的是每个箱子的件数而不是行数。没有创建索引可能有一个更短的方法。但是这里有一个你先将它们分组然后总结
的地方0
在OP的评论之后编辑
如果您想要宽度相等的波段,可以使用:
setorder(dt, Price)
dt[,GROUP:=ceiling(seq_along(Price)/5)][,
list(PriceRange=paste(range(Price), collapse=" - "),
Volume=sum(Volume)),
by="GROUP"]
如果您想要显示所有乐队,可以使用此
dt[, sum(Volume), by=cut(Price, 5)]
HTH