从列汇总数组值

时间:2017-12-18 15:16:13

标签: sql google-bigquery

我的工作任务是总结多个数组的值,我已经达到了一定的差距。非常感谢小组的见解和帮助。

挑战:

我在单个列BigQuery表的每一行中都有一系列域TLD。我想按每个TLD进行分组,并将每个TLD的总计数作为新表返回。

["biz","us","international","eu","com","co","world","us","international","eu","co","biz"]
["com","co","world"]        

响应

**TLD_Name**
biz 2
us 2
international 2
eu 2
com 2
co 3
world 1

提前感谢您的帮助。

2 个答案:

答案 0 :(得分:2)

假设数组列名为tlds,您可以运行以下标准SQL查询:

SELECT
  tld AS TLD_Name,
  COUNT(*) AS count
FROM YourTable
CROSS JOIN UNNEST(tlds) AS tld
GROUP BY tld;

这会使阵列“扁平化”并获得与每个TLD相关的计数。

答案 1 :(得分:1)

如果每行中的tld值高度可重复并且您的行数确实很多 - 下面可能通过首先组合/聚合每行内的tld计数然后在整个表级别进行汇总(对于BigQuery Standard)来提供一点优化SQL)

   
#standardSQL
WITH `yourproject.yourdataset.yourtable` AS (
  SELECT ["biz","us","international","eu","com","co","world","us","international","eu","co","biz"] tlds UNION ALL
  SELECT ["com","co","world","biz"]   
)
SELECT
  tld_count.tld AS tld,
  SUM(tld_count.cnt) AS cnt
FROM `yourproject.yourdataset.yourtable`,
UNNEST(ARRAY(SELECT AS STRUCT tld, COUNT(*) AS cnt FROM UNNEST(tlds) AS tld GROUP BY tld)) AS tld_count
GROUP BY tld