有时间序列交易历史存储在google的BigQuery中。
# Transaction history scheme
exchange_id INTEGER REQUIRED
from_id INTEGER REQUIRED
to_id INTEGER REQUIRED
price FLOAT REQUIRED
size FLOAT REQUIRED
ts TIMESTAMP REQUIRED
is_sell BOOLEAN NULLABLE
_PARTITIONTIME TIMESTAMP NULLABLE
exchange_id - 发生转化的平台
from_id - 基本符号
to_id - 引用符号
价格 - 交易价格
尺寸 - 交易数量
我需要在30秒的时间间隔内汇总OHLC个数据,按时间间隔分组
exchange_id, from_id, to_id
。我怎么能在BigQuery中做到这一点?
# Required OHLC aggregated data scheme
ts TIMESTAMP REQUIRED
exchange_id INTEGER REQUIRED
from_id INTEGER REQUIRED
to_id INTEGER REQUIRED
open FLOAT REQUIRED
high FLOAT REQUIRED
low FLOAT REQUIRED
close FLOAT REQUIRED
volume FLOAT REQUIRED
_PARTITIONTIME TIMESTAMP NULLABLE
打开 - 区间中的第一个价格
高 - 最高价格..
低 - 最低价格。
关闭 - 最后价格..
交易量 - 当前时间间隔内所有交易规模的总和
最有希望的想法是:
SELECT
TIMESTAMP_SECONDS(
UNIX_SECONDS(ts) -
60 * 1000000
) AS time,
exchange_id,
from_id,
to_id,
MIN(price) as low,
MAX(price) as high,
SUM(size) as volume
FROM
`table`
GROUP BY
time, exchange_id, from_id, to_id
ORDER BY
time
这一个:
SELECT
exchange_id,from_id,to_id,
MAX(price) OVER (PARTITION BY exchange_id,from_id,to_id ORDER BY ts RANGE BETWEEN 60 * 1000000 PRECEDING AND CURRENT ROW) as high,
MIN(price) OVER (PARTITION BY exchange_id,from_id,to_id ORDER BY ts RANGE BETWEEN 60 * 1000000 PRECEDING AND CURRENT ROW) as low,
SUM(size) OVER (PARTITION BY exchange_id,from_id,to_id ORDER BY ts RANGE BETWEEN 60 * 1000000 PRECEDING AND CURRENT ROW) as volume,
FROM [table];
# returns:
1 1 4445 3808 9.0E-8 9.0E-8 300000.0
2 1 4445 3808 9.0E-8 9.0E-8 300000.0
3 1 4445 3808 9.0E-8 9.0E-8 300000.0
...
14 1 4445 3808 9.0E-8 9.0E-8 865939.3721800799
15 1 4445 3808 9.0E-8 9.0E-8 865939.3721800799
16 1 4445 3808 9.0E-8 9.0E-8 865939.3721800799
但这没有任何作用。似乎我在BigQuery中遗漏了一些关于滑动窗口的重要内容。
答案 0 :(得分:3)
以下是BigQuery Standard SQL
#standardsql
SELECT
exchange_id,
from_id,
to_id,
TIMESTAMP_SECONDS(DIV(UNIX_SECONDS(ts), 30) * 30) time,
ARRAY_AGG(price ORDER BY ts LIMIT 1)[SAFE_OFFSET(0)] open,
MAX(price) high,
MIN(price) low,
ARRAY_AGG(price ORDER BY ts DESC LIMIT 1)[SAFE_OFFSET(0)] close,
SUM(size) volume
FROM `yourproject.yourdataset.yourtable`
GROUP BY 1, 2, 3, 4
答案 1 :(得分:0)
找到了一种在预定义的date_parts
(docs)上进行聚合的好方法。当您需要在星期一或月份进行汇总时,这将非常有用。
DATETIME_TRUNC 支持以下参数:
MICROSECOND
MILLISECOND
SECOND
MINUTE
HOUR
DAY
WEEK
WEEK(<WEEKDAY>)
MONTH
QUARTER
YEAR
您可以像这样汇总使用它:
#standardsql
SELECT
TIMESTAMP(DATETIME_TRUNC(DATETIME(timestamp), DAY)) as timestamp,
ARRAY_AGG(open ORDER BY timestamp LIMIT 1)[SAFE_OFFSET(0)] open,
MAX(high) high,
MIN(low) low,
ARRAY_AGG(close ORDER BY timestamp DESC LIMIT 1)[SAFE_OFFSET(0)] close,
SUM(volume) volume
FROM `hcmc-project.test_bitfinex.BTC_USD__1h`
GROUP BY timestamp
ORDER BY timestamp ASC