我在BigQuery中有一个包含11亿行的单一字段表。
表属性:
我想创建一个新的,如下所示:
我尝试了不同的解决方案,但一直遇到
“资源超出”
是否有解决此限制的明智方法?还有其他方法可以解决BigQuery内部的问题吗?
我目前产生上述错误的代码
SELECT
GENERATE_UUID() as batch_id,
STRING_AGG(id) as ids_str
from
(
WITH vars AS (
SELECT 25000 as rec_count
)
SELECT
cast(ceiling(ROW_NUMBER() OVER () / 25000) as int64) as batch_count,
25000 as rec_count,
cast(id as string) as id
FROM
tbl_profile
)
group by rec_count
答案 0 :(得分:1)
还有其他解决BigQuery内部问题的方法吗?
如果您的用例允许您放宽一些要求,而不是
The second column to be 25,000 id concatenated into one column
应该是
The second column to be about (close to) 25,000 id concatenated into one column
在这种情况下(对于BigQuery Standard SQL),可以/应该为您工作
#standardSQL
SELECT
GENERATE_UUID() AS batch_id,
COUNT(1) batch_size,
STRING_AGG(id) AS ids_str
FROM (
SELECT
CAST((cnt * RAND()) / 25000 + 0.5 AS INT64) AS batch_count,
CAST(id AS STRING) AS id
FROM `project.dataset.table`
CROSS JOIN (SELECT COUNT(1) cnt FROM `project.dataset.table`)
)
GROUP BY batch_count
这应该产生如下结果
正如您在此处看到的那样,每行中的ID数量不完全是25,000,但足够接近它
希望这可能是您的选择