对行序列进行分组 - BigQuery / SQL

时间:2017-08-29 18:56:33

标签: sql google-bigquery

我需要在一个确定字段中对一系列具有相同值的行进行分组。例如,我有一系列Selina Kyle在Bruce Wayne的两张唱片之间的记录。我需要按用户名对这些记录进行分组,但只要它们是即时序列。例如,我有这张表:

|User | Time |Date_In |Date_Out | |Bruce Wayne | 2793 |2017-08-30 09:55:52 |2017-08-30 10:42:25 | |Selina Kyle | 2430 |2017-08-30 10:42:25 |2017-08-30 11:22:55 | |Selina Kyle | 4461 |2017-08-30 11:22:55 |2017-08-30 12:37:16 | |Selina Kyle | 4356 |2017-08-30 12:37:16 |2017-08-30 13:49:52 | |Selina Kyle | 2295 |2017-08-30 13:49:52 |2017-08-30 14:28:07 | |Bruce Wayne | 2098 |2017-08-30 14:28:07 |2017-08-30 15:03:05 |

我需要按用户名和总和时间对其进行分组,但我需要选择Bruce Wayne记录,因为它们不是紧接着的顺序:

|User |Time |Date_In |Date_Out | |Bruce Wayne |2793 |2017-08-30 09:55:52 |2017-08-30 10:42:25| |Selina Kyle |13542 |2017-08-30 10:42:25 |2017-08-30 14:28:07| |Bruce Wayne |2098 |2017-08-30 14:28:07 |2017-08-30 15:03:05|

1 个答案:

答案 0 :(得分:1)

尝试下面的BigQuery Standard SQL

#standardSQL
SELECT MIN(User) AS User, SUM(TIME) AS TIME, MIN(Date_In) AS Date_In, MAX(Date_Out) AS Date_Out
FROM (
  SELECT *,
    COUNTIF(User != IFNULL(prev_User, User)) OVER(ORDER BY Date_In) AS groupid
  FROM (
    SELECT *, 
      LAG(User) OVER(ORDER BY Date_In) AS prev_User
    FROM `yourTable`
    ORDER BY Date_In
  )
)
GROUP BY groupid
-- ORDER BY Date_In   

您可以使用您问题中的虚拟数据播放/测试此内容,如下所示

#standardSQL
WITH `yourTable` AS (
  SELECT 'Bruce Wayne'AS User, 2793 AS TIME, '2017-08-30 09:55:52' AS Date_In, '2017-08-30 10:42:25' AS Date_Out UNION ALL
  SELECT 'Selina Kyle', 2430, '2017-08-30 10:42:25', '2017-08-30 11:22:55' UNION ALL
  SELECT 'Selina Kyle', 4461, '2017-08-30 11:22:55', '2017-08-30 12:37:16' UNION ALL
  SELECT 'Selina Kyle', 4356, '2017-08-30 12:37:16', '2017-08-30 13:49:52' UNION ALL
  SELECT 'Selina Kyle', 2295, '2017-08-30 13:49:52', '2017-08-30 14:28:07' UNION ALL
  SELECT 'Bruce Wayne', 2098, '2017-08-30 14:28:07', '2017-08-30 15:03:05' 
)
SELECT MIN(User) AS User, SUM(TIME) AS TIME, MIN(Date_In) AS Date_In, MAX(Date_Out) AS Date_Out
FROM (
  SELECT *,
    COUNTIF(User != IFNULL(prev_User, User)) OVER(ORDER BY Date_In) AS groupid
  FROM (
    SELECT *, 
      LAG(User) OVER(ORDER BY Date_In) AS prev_User
    FROM `yourTable`
    ORDER BY Date_In
  )
)
GROUP BY groupid
ORDER BY Date_In   

请注意 - 从你的例子看,当连续行中没有date_out和date_in重叠时你看起来有案例 - 如果你有 - 上面的查询需要进一步调整以反映如何处理这种情况的逻辑< / p>