在Bigquery中获取特定事件类型的第一行?

时间:2018-07-09 08:36:54

标签: sql google-bigquery

Row   EventType   CloudId           ts
1     stop        5201156607311872  2018-07-07 12:25:21 UTC  
2     start       5201156607311872  2018-07-07 12:27:39 UTC  
3     start       5201156607311872  2018-07-07 12:28:15 UTC  
4     stop        5738776789778432  2018-07-07 12:28:54 UTC  
5     stop        5201156607311872  2018-07-07 12:30:30 UTC  
6     stop        5738776789778432  2018-07-07 12:37:45 UTC  
7     stop        5738776789778432  2018-07-07 12:40:52 UTC

我有一个如上所述的表结构。我只想过滤行EventType更改之前的第一个事件。即row 2row 3具有相同的EventType,我需要从表中删除row 3row 4,5,6,7具有相同的EventType,我要保留row 4并删除row 5,6,7

4 个答案:

答案 0 :(得分:3)

使用lag()

select t.*
from (select t.*,
             lag(eventtype) over (order by row) as prev_eventtype
      from t
     ) t
where prev_eventtype is null or prev_eventtype <> eventtype;

答案 1 :(得分:3)

以下是用于BigQuery标准SQL

#standardSQL
SELECT * EXCEPT(prev_eventtype) FROM (
  SELECT *, LAG(eventtype) OVER (ORDER BY ts) AS prev_eventtype
  FROM `project.dataset.table` 
)
WHERE prev_eventtype IS NULL OR prev_eventtype <> eventtype

您可以使用问题中的虚拟数据进行上述测试和操作:

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'stop' EventType, 5201156607311872 CloudId, TIMESTAMP '2018-07-07 12:25:21 UTC' ts UNION ALL  
  SELECT 'start', 5201156607311872, '2018-07-07 12:27:39 UTC' UNION ALL  
  SELECT 'start', 5201156607311872, '2018-07-07 12:28:15 UTC' UNION ALL  
  SELECT 'stop', 5738776789778432, '2018-07-07 12:28:54 UTC' UNION ALL  
  SELECT 'stop', 5201156607311872, '2018-07-07 12:30:30 UTC' UNION ALL  
  SELECT 'stop', 5738776789778432, '2018-07-07 12:37:45 UTC' UNION ALL  
  SELECT 'stop', 5738776789778432, '2018-07-07 12:40:52 UTC' 
)
SELECT * EXCEPT(prev_eventtype) FROM (
  SELECT *, LAG(eventtype) OVER (ORDER BY ts) AS prev_eventtype
  FROM `project.dataset.table` 
)
WHERE prev_eventtype IS NULL OR prev_eventtype <> eventtype

结果:

EventType   CloudId             ts   
stop        5201156607311872    2018-07-07 12:25:21 UTC  
start       5201156607311872    2018-07-07 12:27:39 UTC  
stop        5738776789778432    2018-07-07 12:28:54 UTC  

答案 2 :(得分:1)

select
Row,
EventType,
CloudId,
ts
from 
(
select
Row,
EventType,
CloudId,
ts,
row_number() over (partition by EventType order by CloudId,Row) as rnk
from table_name
)evnt where rnk=1

答案 3 :(得分:1)

您可以使用SELECT语句仅隐藏不需要的行:

select t.*
from table t
where t.row = (select min(t1.row) from table t1 where t1.EventType = t.EventType);