我的工作任务是总结多个数组的值,我已经达到了一定的差距。非常感谢小组的见解和帮助。
挑战:
我在单个列BigQuery表的每一行中都有一系列域TLD。我想按每个TLD进行分组,并将每个TLD的总计数作为新表返回。
["biz","us","international","eu","com","co","world","us","international","eu","co","biz"]
["com","co","world"]
响应
**TLD_Name**
biz 2
us 2
international 2
eu 2
com 2
co 3
world 1
提前感谢您的帮助。
答案 0 :(得分:2)
假设数组列名为tlds
,您可以运行以下标准SQL查询:
SELECT
tld AS TLD_Name,
COUNT(*) AS count
FROM YourTable
CROSS JOIN UNNEST(tlds) AS tld
GROUP BY tld;
这会使阵列“扁平化”并获得与每个TLD相关的计数。
答案 1 :(得分:1)
如果每行中的tld值高度可重复并且您的行数确实很多 - 下面可能通过首先组合/聚合每行内的tld计数然后在整个表级别进行汇总(对于BigQuery Standard)来提供一点优化SQL)
#standardSQL
WITH `yourproject.yourdataset.yourtable` AS (
SELECT ["biz","us","international","eu","com","co","world","us","international","eu","co","biz"] tlds UNION ALL
SELECT ["com","co","world","biz"]
)
SELECT
tld_count.tld AS tld,
SUM(tld_count.cnt) AS cnt
FROM `yourproject.yourdataset.yourtable`,
UNNEST(ARRAY(SELECT AS STRUCT tld, COUNT(*) AS cnt FROM UNNEST(tlds) AS tld GROUP BY tld)) AS tld_count
GROUP BY tld