Google BigQuery在一段时间内汇总OHLC数据

时间:2018-01-04 20:57:14

标签: google-bigquery time-series finance

有时间序列交易历史存储在google的BigQuery中。

# Transaction history scheme

exchange_id INTEGER REQUIRED  
from_id INTEGER REQUIRED    
to_id   INTEGER REQUIRED    
price   FLOAT   REQUIRED    
size    FLOAT   REQUIRED    
ts  TIMESTAMP   REQUIRED    
is_sell BOOLEAN NULLABLE    
_PARTITIONTIME  TIMESTAMP   NULLABLE    

exchange_id - 发生转化的平台 from_id - 基本符号
to_id - 引用符号
价格 - 交易价格
尺寸 - 交易数量

我需要在30秒的时间间隔内汇总OHLC个数据,按时间间隔分组  exchange_id, from_id, to_id。我怎么能在BigQuery中做到这一点?

# Required OHLC aggregated data scheme

ts  TIMESTAMP   REQUIRED 
exchange_id INTEGER REQUIRED  
from_id INTEGER REQUIRED    
to_id   INTEGER REQUIRED    
open   FLOAT   REQUIRED    
high   FLOAT   REQUIRED    
low   FLOAT   REQUIRED    
close   FLOAT   REQUIRED    
volume    FLOAT   REQUIRED 
_PARTITIONTIME  TIMESTAMP   NULLABLE       

打开 - 区间中的第一个价格
- 最高价格..
- 最低价格。
关闭 - 最后价格..
交易量 - 当前时间间隔内所有交易规模的总和

最有希望的想法是:

SELECT 
    TIMESTAMP_SECONDS(
      UNIX_SECONDS(ts) -
      60 * 1000000
    ) AS time,
    exchange_id,
    from_id,
    to_id,
    MIN(price) as low,
    MAX(price) as high,
    SUM(size) as volume
FROM 
    `table`
GROUP BY
  time, exchange_id, from_id, to_id
ORDER BY
  time

这一个:

SELECT 
  exchange_id,from_id,to_id,
  MAX(price) OVER (PARTITION BY exchange_id,from_id,to_id ORDER BY ts RANGE BETWEEN 60 * 1000000 PRECEDING AND CURRENT ROW) as high,
  MIN(price) OVER (PARTITION BY exchange_id,from_id,to_id ORDER BY ts RANGE BETWEEN 60 * 1000000 PRECEDING AND CURRENT ROW) as low,
  SUM(size) OVER (PARTITION BY exchange_id,from_id,to_id ORDER BY ts RANGE BETWEEN 60 * 1000000 PRECEDING AND CURRENT ROW) as volume,
FROM   [table];


# returns:
1   1   4445    3808    9.0E-8  9.0E-8  300000.0     
2   1   4445    3808    9.0E-8  9.0E-8  300000.0     
3   1   4445    3808    9.0E-8  9.0E-8  300000.0     
...
14  1   4445    3808    9.0E-8  9.0E-8  865939.3721800799    
15  1   4445    3808    9.0E-8  9.0E-8  865939.3721800799    
16  1   4445    3808    9.0E-8  9.0E-8  865939.3721800799    

但这没有任何作用。似乎我在BigQuery中遗漏了一些关于滑动窗口的重要内容。

2 个答案:

答案 0 :(得分:3)

以下是BigQuery Standard SQL

  
#standardsql
SELECT 
  exchange_id, 
  from_id, 
  to_id,
  TIMESTAMP_SECONDS(DIV(UNIX_SECONDS(ts), 30) * 30) time,
  ARRAY_AGG(price ORDER BY ts LIMIT 1)[SAFE_OFFSET(0)] open,
  MAX(price) high,
  MIN(price) low,
  ARRAY_AGG(price ORDER BY ts DESC LIMIT 1)[SAFE_OFFSET(0)] close,
  SUM(size) volume
FROM `yourproject.yourdataset.yourtable`
GROUP BY 1, 2, 3, 4

答案 1 :(得分:0)

找到了一种在预定义的date_partsdocs)上进行聚合的好方法。当您需要在星期一或月份进行汇总时,这将非常有用。

DATETIME_TRUNC 支持以下参数:

MICROSECOND
MILLISECOND
SECOND
MINUTE
HOUR
DAY
WEEK
WEEK(<WEEKDAY>)
MONTH
QUARTER
YEAR

您可以像这样汇总使用它:

#standardsql

SELECT 
  TIMESTAMP(DATETIME_TRUNC(DATETIME(timestamp), DAY)) as timestamp,
  ARRAY_AGG(open ORDER BY timestamp LIMIT 1)[SAFE_OFFSET(0)] open,
  MAX(high) high,
  MIN(low) low,
  ARRAY_AGG(close ORDER BY timestamp DESC LIMIT 1)[SAFE_OFFSET(0)] close,
  SUM(volume) volume
FROM `hcmc-project.test_bitfinex.BTC_USD__1h`
GROUP BY timestamp
ORDER BY timestamp ASC