假设我有两个表:intervals
包含索引间隔(其列为i_min
和i_max
),values
包含索引值(列i
和x
)。这是一个例子:
values: intervals:
+---+---+ +-------+-------+
| i | x | | i_min | i_max |
+-------+ +---------------+
| 1 | 1 | | 1 | 4 |
| 2 | 0 | | 6 | 6 |
| 3 | 4 | | 6 | 6 |
| 4 | 9 | | 6 | 6 |
| 6 | 7 | | 7 | 9 |
| 7 | 2 | | 12 | 17 |
| 8 | 2 | +-------+-------+
| 9 | 2 |
+---+---+
我想对每个区间的x值求和:
result:
+-------+-------+-----+
| i_min | i_max | sum |
+---------------------+
| 1 | 4 | 13 | // 1+0+4+9
| 6 | 6 | 7 |
| 6 | 6 | 7 |
| 6 | 6 | 7 |
| 7 | 9 | 6 | // 2+2+2
| 12 | 17 | 0 |
+-------+-------+-----+
在某些SQL引擎中,可以使用以下方法完成:
SELECT
i_min,
i_max,
(SELECT SUM(x)
FROM values
WHERE i BETWEEN intervals.i_min AND intervals.i_max) AS sum_x
FROM
intervals
除了BigQuery不允许查询类型(" SELECT子句中不允许Subselect。"或"如果没有条件相等的条件,则不能使用LEFT OUTER JOIN连接的两边。"取决于使用的语法。)
必须有一种方法可以使用窗口函数来完成此操作,但我无法弄清楚 - 我所见过的所有示例都将分区作为表的一部分。有没有选项不使用CROSS JOIN?如果没有,那么做这个CROSS JOIN的最有效方法是什么?
关于我的数据的一些注释:
intervals
可能会重复,而不是i
。intervals
中的两个区间要么相同,要么完全不相交(没有重叠)。i
的所有值的集合(因此它形成了此空间的分区)。答案 0 :(得分:3)
尝试以下 - BigQuery Standard SQL
#standardSQL
SELECT
i_min, i_max, SUM(x) AS sum_x
FROM (
SELECT i_min, i_max, ROW_NUMBER() OVER() AS line FROM `project.dataset.intervals`
) AS intervals
JOIN (SELECT i, x FROM `project.dataset.values` UNION ALL SELECT NULL, 0) AS values
ON values.i BETWEEN intervals.i_min AND intervals.i_max OR values.i IS NULL
GROUP BY i_min, i_max, line
-- ORDER BY i_min
您可以使用虚拟数据进行/测试,如下所示
#standardSQL
WITH intervals AS (
SELECT 1 AS i_min, 4 AS i_max UNION ALL
SELECT 6, 6 UNION ALL
SELECT 6, 6 UNION ALL
SELECT 6, 6 UNION ALL
SELECT 7, 9 UNION ALL
SELECT 12, 17
),
values AS (
SELECT 1 AS i, 1 AS x UNION ALL
SELECT 2, 0 UNION ALL
SELECT 3, 4 UNION ALL
SELECT 4, 9 UNION ALL
SELECT 6, 7 UNION ALL
SELECT 7, 2 UNION ALL
SELECT 8, 2 UNION ALL
SELECT 9, 2
)
SELECT
i_min, i_max, SUM(x) AS sum_x
FROM (SELECT i_min, i_max, ROW_NUMBER() OVER() AS line FROM intervals) AS intervals
JOIN (SELECT i, x FROM values UNION ALL SELECT NULL, 0) AS values
ON values.i BETWEEN intervals.i_min AND intervals.i_max OR values.i IS NULL
GROUP BY i_min, i_max, line
-- ORDER BY i_min