BigQuery标准SQL Pivot在非重叠窗口上构造和求和

时间:2017-06-20 16:20:41

标签: google-bigquery standard-sql

我试图查询使用基本重复字段的表来存储这样的数据:

+---+----------+------------+
| i | data.key | data.value |
+---+----------+------------+
| 0 | a        |          1 |
|   | b        |          2 |
| 1 | a        |          3 |
|   | b        |          4 |
| 2 | a        |          5 |
|   | b        |          6 |
| 3 | a        |          7 |
|   | b        |          8 |
+---+----------+------------+

我试图弄清楚如何运行获得结果的查询

+---+----+----+
| i | a  | b  |
+---+----+----+
| 1 |  4 |  6 |
| 3 | 12 | 14 |
+---+----+----+

其中每一行代表一个非重叠的总和(即i=1是行i=0i=1的总和)并且数据已被旋转,使得data.key为现在是一个专栏。

问题1:

我尽力将this answer转换为使用标准SQL,最后得到:

SELECT
    i,
    (SELECT SUM(value) FROM UNNEST(data) WHERE key = 'a') as `a`,
    (SELECT SUM(value) FROM UNNEST(data) WHERE key = 'b') as `b`
  FROM
    `dataset.testing.dummy`)

这很有用,但我想知道是否有更好的方法可以做到这一点,特别是因为它在尝试使用分析函数时会产生一个特别冗长的查询:

SELECT
  i,
  SUM(a) OVER (ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS `a`,
  SUM(b) OVER (ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS `b`
FROM (
  SELECT
    i,
    (SELECT SUM(value) FROM UNNEST(data) WHERE key = 'a') as `a`,
    (SELECT SUM(value) FROM UNNEST(data) WHERE key = 'b') as `b`
  FROM
    `dataset.testing.dummy`)
ORDER BY
  i;

问题2:

如何撰写ROWRANGE声明,以便生成的窗口不会重叠。在上一个查询中,我得到了数据的滚动总和,这不是我想做的事情。

+---+----+----+
| i | a  | b  |
+---+----+----+
| 0 |  1 |  2 |
| 1 |  4 |  6 |
| 2 |  8 | 10 |
| 3 | 12 | 14 |
+---+----+----+

滚动总和为每一行产生一个结果,而我试图减少返回的行数。

1 个答案:

答案 0 :(得分:1)

使用临时SQL函数和命名窗口有助于详细说明。不过,我不得不使用另一个子选择在i之后应用过滤器。这是一个独立的例子:

#standardSQL
CREATE TEMP FUNCTION SumKey(
    data ARRAY<STRUCT<key STRING, value INT64>>,
    target_key STRING) AS (
  (SELECT SUM(value) FROM UNNEST(data) WHERE key = target_key) 
);

WITH Input AS (
  SELECT
    0 AS i,
    ARRAY<STRUCT<key STRING, value INT64>>[('a', 1), ('b', 2)] AS data UNION ALL
  SELECT 1, ARRAY<STRUCT<key STRING, value INT64>>[('a', 3), ('b', 4)] UNION ALL
  SELECT 2, ARRAY<STRUCT<key STRING, value INT64>>[('a', 5), ('b', 6)] UNION ALL
  SELECT 3, ARRAY<STRUCT<key STRING, value INT64>>[('a', 7), ('b', 8)]
)
SELECT * FROM (
  SELECT
    i,
    SUM(a) OVER W AS a,
    SUM(b) OVER W AS b
  FROM (
    SELECT
      i,
      SumKey(data, 'a') AS a,
      SumKey(data, 'b') AS b
    FROM Input
  )
  WINDOW W AS (ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
)
WHERE MOD(i, 2) = 1
ORDER BY i;

这导致:

+---+----+----+
| i | a  | b  |
+---+----+----+
| 1 |  4 |  6 |
| 3 | 12 | 14 |
+---+----+----+