Question

所以我在AWS Redshift上有下表

node_id    power_source    timestamp
----------------------------------------------
108          LINE         2019-09-10 09:15:30
108          BATT         2019-09-10 10:20:15
108          LINE         2019-09-10 13:45:00
108          LINE         2019-09-11 06:00:15
108          BATT         2019-09-12 05:50:15
108          BATT         2019-09-12 12:15:15
108          LINE         2019-09-12 18:45:15
108          LINE         2019-09-13 09:20:15
108          BATT         2019-09-14 11:20:15
108          BATT         2019-09-14 13:30:15
108          BATT         2019-09-14 15:30:15
108          LINE         2019-09-14 16:48:36
108          LINE         2019-09-15 09:20:15

我试图找出节点的power_source在“ BATT”上的运行时间（累计）。我在想可以对时间戳进行datediff，但是我需要获取“ BATT”行之后的第一个“ LINE”行的时间戳（基于ts）。虽然不是很确定如何获得该价值。一旦有了，我就可以对datediff求和。

编辑：

这是预期的结果

node_id    power_source    timestamp             ts_line_power          ts_diff(in mins)
-----------------------------------------------------------------------------------------
108          BATT         2019-09-10 10:20:15    2019-09-10 13:45:00    205
108          BATT         2019-09-12 05:50:15    2019-09-12 18:45:15    785
108          BATT         2019-09-14 11:20:15    2019-09-14 16:48:36    328

任何帮助/协助将不胜感激

Answer 1

如果我理解正确，则可以使用lead()：

select node_id,
       sum(datediff(minute, timestamp, next_ts)) as diff_in_minutes
from (select t.*,
             lead(timestamp) over (partition by node_id order by timestamp) as next_ts
      from t
     ) t
where power_source = 'BATT'
group by node_id;

这将获得BATT记录之后的时间戳，并使用该时间戳来定义结束时间。

编辑：

以上是所有“ BATT”的整体内容。您有一个“群岛”问题。为此，您可以通过计算非BATT记录的数量更大而不是每行来分配一个组。这将保留组中的下一条记录。

这是所有窗口函数和聚合：

select node_id, min(timestamp), max(timestamp),
       sum(datediff(minute, min(timestamp), max(timestamp))) as diff_in_minutes
from (select t.*,
             sum( (power_source = 'LINE')::int ) over (partition by node_id order by timestamp desc) as grp
      from t
     ) t
group by node_id, grp
having sum( (power_source = 'BATT')::int) > 0;  -- only include rows that have at least one BATT

请注意，这假设只有“ LINE”和“ BATT”是电源的有效值。

从其他行中选择一列

1 个答案: