就像这里的示例一样,我想跨BigQuery数组进行计数:Distinct Count across Bigquery arrays
但是,我还有一些其他要求,这些要求使该帖子中提供的解决方案对我而言是可行的:
因此,尽管此扩展示例(包含用户作为分组维度)可以使用HLL:
#standardSQL
WITH
test AS (
SELECT
'A' AS User, DATE('2018-01-01') AS ReportDate, 2 AS value, [1,2,3] AS key
UNION ALL
SELECT
'A' AS User, DATE('2018-01-02') AS ReportDate, 3 AS value, [1,4,5] AS key
UNION ALL
SELECT
'B' AS User, DATE('2018-01-02') AS ReportDate, 4 AS value, [4,5,6,7,8] AS key
UNION ALL
SELECT
'B' AS User, DATE('2018-01-02') AS ReportDate, 5 AS value, [3,4,5,6,7] AS key )
SELECT
User,
SUM(value) total_value,
HLL_COUNT.MERGE((
SELECT
HLL_COUNT.INIT(key)
FROM
UNNEST(key) key)) AS unique_key_count
FROM
test
GROUP BY
user
我需要一个能够满足上述要求的不同聚合数组计数版本。
同样,这意味着如果我仅将 ReportDate 分组,将 User / ReportDate 组合在一起,或者将该示例扩展了其他维度的情况下,它也应该可以正常工作
答案 0 :(得分:1)
#standardSQL
WITH test AS
(
SELECT 'A' AS User, DATE('2018-01-01') AS ReportDate, 2 AS value, [1,2,3] AS key UNION ALL
SELECT 'A' AS User, DATE('2018-01-02') AS ReportDate, 3 AS value, [1,4,5] AS key UNION ALL
SELECT 'B' AS User, DATE('2018-01-02') AS ReportDate, 4 AS value, [4,5,6,7,8] AS key UNION ALL
SELECT 'B' AS User, DATE('2018-01-02') AS ReportDate, 5 AS value, [3,4,5,6,7] AS key
)
SELECT
User,
SUM(IF(flag=0, value, 0)) total_value,
COUNT(DISTINCT key) unique_key_count
FROM test, UNNEST(key) key WITH OFFSET flag
GROUP BY User
有结果
Row User total_value unique_key_count
1 A 5 5
2 B 9 6