在BigQuery中将阵列大小限制为1MB

时间:2019-06-03 23:44:48

标签: sql google-bigquery

我在BigQuery中执行以下操作:

SELECT ARRAY_AGG(state IGNORE NULLS LIMIT 10000) 
FROM mytable
GROUP BY state

将结果限制为不超过1MB的最佳方法是什么?以前,我在ARRAY_AGG内执行LIMIT(限制)操作,但如果有较大的文本字段,则通常会超出限制,因此我更希望通过最终结果大小来限制它。

1 个答案:

答案 0 :(得分:4)

其中一个选项(BigQuery Standard SQL)

#standardSQL
WITH temp AS (
  SELECT state, SUM(LENGTH(state)) OVER(ORDER BY pos) size 
  FROM (
    SELECT state, ROW_NUMBER() OVER() pos
    FROM `project.dataset.table`
  )
)
SELECT ARRAY_AGG(state IGNORE NULLS)
FROM temp
WHERE size < 1000000    

您可以使用下面的虚拟示例进行测试,并在上面玩:

#standardSQL
WITH `project.dataset.table` AS (
  SELECT REPEAT('a', CAST(100 * RAND() AS INT64)) state
  FROM UNNEST(GENERATE_ARRAY(1, 100))
), temp AS (
  SELECT state, SUM(LENGTH(state)) OVER(ORDER BY pos) size 
  FROM (
    SELECT state, ROW_NUMBER() OVER() pos
    FROM `project.dataset.table`
  )
)
SELECT ARRAY_AGG(state IGNORE NULLS)
FROM temp
WHERE size < 5000