我有一个数据集,我想按彼此接近的时间戳(例如少于30分钟)对它进行分区
Driver | Timestamp
A | 10/30/2019 05:02:28
A | 10/30/2019 05:05:28
A | 10/30/2019 05:09:28
A | 10/30/2019 05:12:28
A | 10/30/2019 07:54:28
A | 10/30/2019 07:57:28
A | 10/30/2019 08:02:28
A | 10/30/2019 12:14:28
A | 10/30/2019 12:17:28
A | 10/30/2019 12:22:28
我们如何像下面这样分割它:
id | Driver | Timestamp
1 | A | 10/30/2019 05:02:28
1 | A | 10/30/2019 05:05:28
1 | A | 10/30/2019 05:09:28
1 | A | 10/30/2019 05:12:28
2 | A | 10/30/2019 07:54:28
2 | A | 10/30/2019 07:57:28
2 | A | 10/30/2019 08:02:28
3 | A | 10/30/2019 12:14:28
3 | A | 10/30/2019 12:17:28
3 | A | 10/30/2019 12:22:28
任何帮助将不胜感激,非常感谢!
答案 0 :(得分:2)
这取决于您的实际需求。
如果您想在两个连续的时间戳之间有30分钟以上的间隔时进入一个新的组,则可以使用lag()
和累积的sum()
:
select
sum(case
when timestamp < lag_timestamp + interval '30' minute
then 0
else 1
end
) id,
driver,
timestamp
from (
select
t.*,
lag(timestamp) over(partition by driver order by timestamp) lag_timestamp
from mytable t
) t
答案 1 :(得分:1)
检查您的版本是否支持sessionize
表运算符:
SELECT *
FROM Sessionize
( ON
(
SELECT *
FROM tab
)
PARTITION BY driver
ORDER BY ts
USING
TimeColumn('ts')
Timeout(1800)
)
答案 2 :(得分:0)
我认为您正在寻求将每个驱动程序的数据进行会话化。试试这个方法。它将session_id附加到其各自的驱动程序以创建特定于驱动程序的session_id。
select
driver||sum(session_code) over (partition by driver order by timestamp) as session_id,
driver,
timestamp
from
(select
driver,
timestamp,
case when timestamp > lag(timestamp) over (partition by driver order by timestamp) + interval '1800' second
then 1 else 0 end as session_code
from your_table) a