按彼此接近的时间戳进行分区(例如30分钟)

时间:2019-11-26 21:35:19

标签: sql teradata partition teradata-sql-assistant

我有一个数据集,我想按彼此接近的时间戳(例如少于30分钟)对它进行分区

Driver | Timestamp
A      | 10/30/2019 05:02:28
A      | 10/30/2019 05:05:28
A      | 10/30/2019 05:09:28
A      | 10/30/2019 05:12:28
A      | 10/30/2019 07:54:28
A      | 10/30/2019 07:57:28
A      | 10/30/2019 08:02:28
A      | 10/30/2019 12:14:28
A      | 10/30/2019 12:17:28
A      | 10/30/2019 12:22:28

我们如何像下面这样分割它:

id     | Driver    |    Timestamp
1      |    A      | 10/30/2019 05:02:28
1      |    A      | 10/30/2019 05:05:28
1      |    A      | 10/30/2019 05:09:28
1      |    A      | 10/30/2019 05:12:28
2      |    A      | 10/30/2019 07:54:28
2      |    A      | 10/30/2019 07:57:28
2      |    A      | 10/30/2019 08:02:28
3      |    A      | 10/30/2019 12:14:28
3      |    A      | 10/30/2019 12:17:28
3      |    A      | 10/30/2019 12:22:28

任何帮助将不胜感激,非常感谢!

3 个答案:

答案 0 :(得分:2)

这取决于您的实际需求。

如果您想在两个连续的时间戳之间有30分钟以上的间隔时进入一个新的组,则可以使用lag()和累积的sum()

select
    sum(case 
        when timestamp < lag_timestamp + interval '30' minute
            then 0
            else 1
        end
    ) id,
    driver,
    timestamp
from (
    select
        t.*,
        lag(timestamp) over(partition by driver order by timestamp) lag_timestamp
    from mytable t
) t

答案 1 :(得分:1)

检查您的版本是否支持sessionize表运算符:

SELECT * 
FROM Sessionize
 ( ON
    (
      SELECT *
      FROM tab
    )
   PARTITION BY driver
   ORDER BY ts
   USING
     TimeColumn('ts')
     Timeout(1800)
 )

答案 2 :(得分:0)

我认为您正在寻求将每个驱动程序的数据进行会话化。试试这个方法。它将session_id附加到其各自的驱动程序以创建特定于驱动程序的session_id。

select 
   driver||sum(session_code) over (partition by driver order by timestamp) as session_id,
   driver,
   timestamp
from 
   (select 
      driver,
      timestamp, 
      case when timestamp > lag(timestamp) over (partition by driver order by timestamp) + interval '1800' second 
          then 1 else 0 end as session_code
    from your_table) a