在分析函数中按日期排序的bigquery命令不起作用?

时间:2018-04-19 14:27:50

标签: sql google-bigquery

以下是示例数据

    WITH dummy_data AS (
  SELECT DATE '2017-01-01' AS ref_month, 18 AS value, 1 AS id
  UNION ALL SELECT DATE '2017-02-01' AS ref_month, 20 AS value, 1 AS id
  UNION ALL SELECT DATE '2017-03-01' AS ref_month, 22 AS value, 1 AS id
  -- UNION ALL SELECT DATE '2017-04-01' as ref_month, 28 as value, 1 as id
  UNION ALL SELECT DATE '2017-05-01' AS ref_month, 30 AS value, 1 AS id
  UNION ALL SELECT DATE '2017-06-01' AS ref_month, 37 AS value, 1 AS id
  UNION ALL SELECT DATE '2017-07-01' AS ref_month, 42 AS value, 1 AS id
  -- UNION ALL SELECT DATE '2017-08-01' as ref_month, 55 as value, 1 as id
  -- UNION ALL SELECT DATE '2017-09-01' as ref_month, 49 as value, 1 as id
  UNION ALL SELECT DATE '2017-10-01' AS ref_month, 51 AS value, 1 AS id
  UNION ALL SELECT DATE '2017-11-01' AS ref_month, 57 AS value, 1 AS id
  UNION ALL SELECT DATE '2017-12-01' AS ref_month, 56 AS value, 1 AS id
  UNION ALL SELECT DATE '2017-01-01' AS ref_month, 18 AS value, 2 AS id
  UNION ALL SELECT DATE '2017-02-01' AS ref_month, 20 AS value, 2 AS id
  UNION ALL SELECT DATE '2017-03-01' AS ref_month, 22 AS value, 2 AS id
  UNION ALL SELECT DATE '2017-04-01' AS ref_month, 28 AS value, 2 AS id
  -- UNION ALL SELECT DATE '2017-05-01' as ref_month, 30 as value, 2 as id
  -- UNION ALL SELECT DATE '2017-06-01' as ref_month, 37 as value, 2 as id
  UNION ALL SELECT DATE '2017-07-01' AS ref_month, 42 AS value, 2 AS id
  UNION ALL SELECT DATE '2017-08-01' AS ref_month, 55 AS value, 2 AS id
--   UNION ALL SELECT DATE '2017-09-01' AS ref_month, 49 AS value, 2 AS id
  -- UNION ALL SELECT DATE '2017-10-01' as ref_month, 51 as value, 2 as id
  UNION ALL SELECT DATE '2017-11-01' AS ref_month, 57 AS value, 2 AS id
  UNION ALL SELECT DATE '2017-12-01' AS ref_month, 56 AS value, 2 AS id
)

我正在尝试运行这个简单的查询

select
id 
,value
, ref_month
, ARRAY_AGG(value) OVER w1 as agg_last_3_values
from dummy_data
window w1 as (partition by id order by ref_month RANGE BETWEEN 2 PRECEDING AND CURRENT ROW)

为什么会出现以下错误?

ORDER BY key must be numeric in a RANGE-based window with OFFSET PRECEDING or OFFSET FOLLOWING boundaries, but has type DATE

我不明白为什么它不应该处理日期....任何建议?

2 个答案:

答案 0 :(得分:3)

使用Array ( [0] => Array ( [id] => 3412341233214 [number] => 21000 ) [1] => Array ( [id] => 12121212121212 [number] => 35000 ) ) 代替rows

range

select id, value, ref_month, ARRAY_AGG(value) OVER w1 as agg_last_3_values from dummy_data window w1 as (partition by id order by ref_month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) 很棘手,因为它必须处理关系 - 所以窗口中包含具有相同排序键值的行。这通常会导致难以调试的错误,但偶尔会有用。

我不熟悉其他语言中range order by的限制。但是,似乎BigQuery假设range键是数字。

答案 1 :(得分:2)

使用RANGE时,ORDER BY中的键必须为数字
您似乎正在尝试从BIGQUERY moving average with missing values采用查询,但请注意month_pos在此处使用的计算字段

您可以使用以下内容来解决此问题:

DATE_DIFF(ref_month, '2016-01-01', MONTH) month_pos   

另外,想要指出 - 使用RANGEROWS非常重要,因为窗口函数应用于不基于行位置的行集,而是应用于月的值