通过带时间戳的过渡表进行日期卷积-如何

时间:2018-06-27 17:38:47

标签: google-bigquery

从过去的某个时刻到现在,我们每天都需要根据带有时间戳的布尔设备转换列表进行卷积。最终输出应该是一个表,该表每天都有一个date:device_id条目(否则该日期没有条目)。

这是单个设备的示例过渡表:

Device Transition Table

要生成卷积日历:

  calendar AS (
    SELECT day
    FROM UNNEST (GENERATE_DATE_ARRAY('2011-05-15', CURRENT_DATE())) AS day
  ),

然后,至少生成一个仅包含转换事件之后的转换日期的表,以便随后可以对它们进行排名和最近选择(在此处交叉加入-uck!):

joined_with_cal AS (

SELECT 
  cal.day as online_date,
  otr.when_changed,
  otr.device_id,
  otr.is_online,
  otr.rank_by_date
FROM 
  calendar AS cal
CROSS JOIN 
  ordered_transitions otr
WHERE
  cal.day >= DATE(otr.when_changed)
),

然后,尝试按时间戳对分区中的最新记录进行排名和选择的代码(when_changed或ranked_by_date-似乎都不起作用)

SELECT  
  online_date,
  when_changed,
  device_id,
  is_online,
  rank_by_date,
FROM (
  SELECT
    online_date,
    when_changed,
    device_id,
    is_online,
    rank_by_date,
    RANK() OVER (PARTITION BY device_id ORDER BY rank_by_date ASC) as final_rank
  FROM
    joined_with_cal
)
WHERE
  final_rank = 1 AND
  --  online_date < '2017-08-01' AND
  device_id = 419609
ORDER BY
  online_date,
  when_changed,
  device_id

但是,这是行不通的,而且显然很丑。

有人可以提出正确,优雅的解决方案吗?

谢谢!

1 个答案:

答案 0 :(得分:1)

@Mikhail:感谢您查看它,对不起,我的解释不够清楚。

与一位同事讨论之后,我最终使用了一个似乎很有效的自我加入:

trans_as_range_not_first AS (

  SELECT
    t1.device_id,
    t1.rank_by_when,
    t2.when_changed as online_start,
    t1.when_changed as online_stop,
    t1.account_id,
    t1.account_name,
    t1.server_type
  FROM
    ordered_trans AS t1  -- lower in rank index, later in time
  LEFT JOIN
    ordered_trans AS t2  -- greater in rank index, earlier in time
  ON
    t1.device_id = t2.device_id AND 
    t1.rank_by_when+1 = t2.rank_by_when  -- current and next row
  WHERE
    t1.is_online = 0 AND t2.is_online = 1
  GROUP BY
    device_id,
    rank_by_when,
    online_start,
    online_stop,
    account_id,
    account_name,
    server_type
),