我有一张这样的表:
#standardSQL
WITH k AS (
SELECT 1 id, 1 subgrp, 'stuff1' content UNION ALL
SELECT 2, 2, 'stuff2' UNION ALL
SELECT 3, 3, 'stuff3' UNION ALL
SELECT 4, 4, 'stuff4' UNION ALL
SELECT 5, 1, 'ostuff1' UNION ALL
SELECT 6, 2, 'ostuff2' UNION ALL
SELECT 7, 3, 'ostuff3' UNION ALL
SELECT 8, 4, 'ostuff4'
)
并且喜欢根据subgrp
值进行分组以重新创建缺失的grp
:如果subgrp
值小于上一行,则属于同一组。
中间结果将是:
| id | grp | subgrp | content |
| 1 | 1 | 1 | stuff1 |
| 2 | 1 | 2 | stuff2 |
| 3 | 1 | 3 | stuff3 |
| 4 | 1 | 4 | stuff4 |
| 5 | 2 | 1 | ostuff1 |
| 6 | 2 | 2 | ostuff2 |
| 7 | 2 | 3 | ostuff3 |
| 8 | 2 | 4 | ostuff4 |
然后我可以申请
SELECT id, grp, ARRAY_AGG(STRUCT(subgrp, content)) rcd
FROM k ORDER BY id, grp
让我有一个很好的嵌套结构。
注意:
3
2
4
subgrp - 这只是为了说明所以无法硬编码问题:我如何(重新)在此处创建grp
列?我玩了几个Window功能无济于事。
修改
尽管Gordon的答案有效,但 3分超过 104M 记录才能运行,因为{{1},我不得不删除最终结果集上的ORDER BY
}}
任何人都有大型数据集的替代解决方案?
答案 0 :(得分:1)
分配组的一种简单方法是对subgrp = 1
值进行累计计数:
select k.*,
sum(case when subgrp = 1 then 1 else 0 end) over (order by id) as grp
from k;
您也可以使用lag()
和累计金额按照自己的方式进行操作。这需要一个子查询:
select k.*,
sum(case when prev_subgrp = subgrp then 0 else 1 end) over (order by id) as grp
from (select k.*,
lag(subgrp) over (order by id) as prev_subgrp
from k
) k
答案 1 :(得分:0)
下面可能表现得更好 - 但有限制 - 我假设子组内的编号和各个ID没有差距
#standardSQL
WITH k AS (
SELECT 1 id, 1 subgrp, 'stuff1' content UNION ALL
SELECT 2, 2, 'stuff2' UNION ALL
SELECT 3, 3, 'stuff3' UNION ALL
SELECT 4, 4, 'stuff4' UNION ALL
SELECT 5, 1, 'ostuff1' UNION ALL
SELECT 6, 2, 'ostuff2' UNION ALL
SELECT 7, 3, 'ostuff3' UNION ALL
SELECT 8, 4, 'ostuff4'
)
SELECT
ROW_NUMBER() OVER(ORDER BY id) grp,
rcd
FROM (
SELECT
MIN(id) id,
ARRAY_AGG(STRUCT(subgrp, content)) rcd
FROM k
GROUP BY id - subgrp
)
结果是
Row grp rcd.subgrp rcd.content
1 1 1 stuff1
2 stuff2
3 stuff3
4 stuff4
2 2 1 ostuff1
2 ostuff2
3 ostuff3
4 ostuff4