我在BigQuery中执行以下操作:
SELECT ARRAY_AGG(state IGNORE NULLS LIMIT 10000)
FROM mytable
GROUP BY state
将结果限制为不超过1MB的最佳方法是什么?以前,我在ARRAY_AGG内执行LIMIT(限制)操作,但如果有较大的文本字段,则通常会超出限制,因此我更希望通过最终结果大小来限制它。
答案 0 :(得分:4)
其中一个选项(BigQuery Standard SQL)
#standardSQL
WITH temp AS (
SELECT state, SUM(LENGTH(state)) OVER(ORDER BY pos) size
FROM (
SELECT state, ROW_NUMBER() OVER() pos
FROM `project.dataset.table`
)
)
SELECT ARRAY_AGG(state IGNORE NULLS)
FROM temp
WHERE size < 1000000
您可以使用下面的虚拟示例进行测试,并在上面玩:
#standardSQL
WITH `project.dataset.table` AS (
SELECT REPEAT('a', CAST(100 * RAND() AS INT64)) state
FROM UNNEST(GENERATE_ARRAY(1, 100))
), temp AS (
SELECT state, SUM(LENGTH(state)) OVER(ORDER BY pos) size
FROM (
SELECT state, ROW_NUMBER() OVER() pos
FROM `project.dataset.table`
)
)
SELECT ARRAY_AGG(state IGNORE NULLS)
FROM temp
WHERE size < 5000