我想在行之间连接数组,然后进行不同的计数。理想情况下,这可以工作:
WITH test AS
(
SELECT
DATE('2018-01-01') as date,
2 as value,
[1,2,3] as key
UNION ALL
SELECT
DATE('2018-01-02') as date,
3 as value,
[1,4,5] as key
)
SELECT
SUM(value) as total_value,
ARRAY_LENGTH(ARRAY_CONCAT_AGG(DISTINCT key)) as unique_key_count
FROM test
不幸的是,ARRAY_CONCAT_AGG
函数不支持DISTINCT
运算符。我可以对数组进行嵌套,但随后出现扇出,并且value列的总和是错误的:
WITH test AS
(
SELECT
DATE('2018-01-01') as date,
2 as value,
[1,2,3] as key
UNION ALL
SELECT
DATE('2018-01-02') as date,
3 as value,
[1,4,5] as key
)
SELECT
SUM(value) as total_value,
COUNT(DISTINCT k) as unique_key_count
FROM test
CROSS JOIN UNNEST(key) k
我缺少什么可以让我避免加入未嵌套的数组吗?
答案 0 :(得分:3)
这里是替代方法:
CREATE TEMP FUNCTION DistinctCount(arr ANY TYPE) AS (
(SELECT COUNT(DISTINCT x) FROM UNNEST(arr) AS x)
);
WITH test AS
(
SELECT
DATE('2018-01-01') as date,
2 as value,
[1,2,3] as key
UNION ALL
SELECT
DATE('2018-01-02') as date,
3 as value,
[1,4,5] as key
)
SELECT
SUM(value) as total_value,
DistinctCount(ARRAY_CONCAT_AGG(key)) as unique_key_count
FROM test
这避免了子查询或需要将数组与表连接(导致总和重复的值)。
答案 1 :(得分:3)
以下是用于BigQuery标准SQL
#standardSQL
WITH test AS
(
SELECT DATE('2018-01-01') AS DATE, 2 AS value, [1,2,3] AS key UNION ALL
SELECT DATE('2018-01-02') AS DATE, 3 AS value, [1,4,5] AS key
)
SELECT
total_value,
COUNT(DISTINCT key) unique_key_count
FROM (
SELECT
SUM(value) AS total_value,
ARRAY_CONCAT_AGG(key) AS all_keys
FROM test
), UNNEST(all_keys) key
GROUP BY total_value
结果:
Row total_value unique_key_count
1 5 5
如果您的表中有很多行-您可以轻松解决内存/资源问题-在这种情况下,您可以尝试使用HyperLogLog++ Functions进行近似汇总-参见下面的示例
#standardSQL
WITH test AS
(
SELECT DATE('2018-01-01') AS DATE, 2 AS value, [1,2,3] AS key UNION ALL
SELECT DATE('2018-01-02') AS DATE, 3 AS value, [1,4,5] AS key
)
SELECT
SUM(value) total_value,
HLL_COUNT.MERGE((SELECT HLL_COUNT.INIT(key) FROM UNNEST(key) key)) AS unique_key_count
FROM test
有结果
Row total_value unique_key_count
1 5 5
注意:这是近似汇总-因此请注意precision
函数中的HLL_COUNT.INIT(input [, precision])
参数