分组连续记录

时间:2017-04-04 18:43:31

标签: sql google-bigquery

这是我的记录(输入)。 field2始终从100开始,之后它可以取100以上的任何值。

field1   field2
===============
val1     100
val2     110
------------
val3     100
val4     110
val3     130
val3     140
------------
val1     100

我需要对以100开头的连续记录进行分组,之后它可以是任何不是100的数字。对于上面的示例,我需要的输出是

field 1                    field2
===================================================
(val1, val2)                (100, 110)
(val3, val4, val3, val3)    (100, 110, 130, 140)
(val1)                      (100)

我如何实现这个

2 个答案:

答案 0 :(得分:2)

我假设您有一个指定排序的列。然后,您可以通过计算" 100"的数量来识别群组。记录在每条记录之前,然后使用array_agg()

select array_agg(field1 order by id) as field1s,
       array_agg(field2 order by id) as field2s
from (select t.*,
             sum(case when field2 = 100 then 1 else 0 end) over (order by id) as grp
      from t
     ) t
group by grp;

注意:MySQL中的解决方案看起来非常非常不同。但是,它仍然以select开头。

答案 1 :(得分:1)

您必须有一些可用于定义订单的字段 在下面的例子中,我假设它是id字段

下面应该做你期望的事情

   
#standardSQL
SELECT 
  CONCAT('(',STRING_AGG(field1 ORDER BY id), ')') AS field1,
  CONCAT('(',STRING_AGG(CAST(field2 AS STRING) ORDER BY id), ')') AS field2
FROM (
  SELECT 
    id, field1, field2,
    COUNTIF(field2 = 100) OVER (ORDER BY id) AS grp
  FROM yourTable
) t
GROUP BY grp
ORDER BY MIN(id)   

您可以使用您问题中的以下虚拟数据进行测试/尝试

#standardSQL
WITH yourTable AS (
  SELECT 1 AS id, 'val1' AS field1, 100 AS field2 UNION ALL
  SELECT 2 AS id, 'val2' AS field1, 110 AS field2 UNION ALL
  SELECT 3 AS id, 'val3' AS field1, 100 AS field2 UNION ALL
  SELECT 4 AS id, 'val4' AS field1, 110 AS field2 UNION ALL
  SELECT 5 AS id, 'val3' AS field1, 130 AS field2 UNION ALL
  SELECT 6 AS id, 'val3' AS field1, 140 AS field2 UNION ALL
  SELECT 7 AS id, 'val1' AS field1, 100 AS field2 
)
SELECT 
  CONCAT('(',STRING_AGG(field1 ORDER BY id), ')') AS field1,
  CONCAT('(',STRING_AGG(CAST(field2 AS STRING) ORDER BY id), ')') AS field2
FROM (
  SELECT 
    id, field1, field2,
    COUNTIF(field2 = 100) OVER (ORDER BY id) AS grp
  FROM yourTable
) 
GROUP BY grp
ORDER BY MIN(id) 

输出

field1                  field2   
------                  ------   
(val1,val2)             (100,110)    
(val3,val4,val3,val3)   (100,110,130,140)    
(val1)                  (100)    
  

想知道是否可以(不使用订单栏)?

如果你的表中只有这两个字段 - 你很可能运气不好,需要重新考虑使用附加字段来填充此表的逻辑(作为时间表)

作为一个完全最后的手段 - 您可以尝试下面的示例,其中这样的列是在飞行中生成的 - 但请理解,绝对不能保证获得您期望的订单

#standardSQL
WITH yourTable AS (
  SELECT 'val1' AS field1, 100 AS field2 UNION ALL
  SELECT 'val2' AS field1, 110 AS field2 UNION ALL
  SELECT 'val3' AS field1, 100 AS field2 UNION ALL
  SELECT 'val4' AS field1, 110 AS field2 UNION ALL
  SELECT 'val3' AS field1, 130 AS field2 UNION ALL
  SELECT 'val3' AS field1, 140 AS field2 UNION ALL
  SELECT 'val1' AS field1, 100 AS field2 
),
tempTable AS (
  SELECT field1, field2, ROW_NUMBER() OVER() AS id  
  FROM yourTable
)
SELECT 
  CONCAT('(',STRING_AGG(field1 ORDER BY id), ')') AS field1,
  CONCAT('(',STRING_AGG(CAST(field2 AS STRING) ORDER BY id), ')') AS field2
FROM (
  SELECT 
    id, field1, field2,
    COUNTIF(field2 = 100) OVER (ORDER BY id) AS grp
  FROM tempTable
) 
GROUP BY grp
ORDER BY MIN(id) 

输出相同 - 但同样 - 不保证!

field1                  field2   
------                  ------   
(val1,val2)             (100,110)    
(val3,val4,val3,val3)   (100,110,130,140)    
(val1)                  (100)