我的问题是要从SQL中的时间信息中分离数据
我有电车的交易数据。像这样
DateTime
2018-04-03T08:06:04
2018-04-03T08:07:27
2018-04-03T08:18:18
2018-04-03T10:08:27
2018-04-03T10:22:24
2018-04-03T12:08:50
2018-04-03T12:24:49
2018-04-03T12:24:51
这是客户在特定日期点击特定电车的信息。包含同一电车的3个旅程
我如何将它们分为3种不同的旅程,以便获得理想的结果
Rank DateTime
1 2018-04-03T08:06:04
1 2018-04-03T08:07:27
1 2018-04-03T08:18:18
2 2018-04-03T10:08:27
2 2018-04-03T10:22:24
3 2018-04-03T12:08:50
3 2018-04-03T12:24:49
3 2018-04-03T12:24:51
我尝试通过将+-40分钟添加到时间来进行尝试,范围内的所有交易将是一次独特的旅程。但是不能成功。
答案 0 :(得分:1)
以下是用于BigQuery标准SQL
#standardSQL
SELECT dt, 1 + SUM(start) OVER(ORDER BY dt) journey
FROM (
SELECT dt, IF(TIMESTAMP_DIFF(dt, LAG(dt) OVER(ORDER BY dt), MINUTE) > 40, 1, 0) start
FROM `project.dataset.table`
)
您可以使用下面的问题中的虚拟数据进行测试,操作
#standardSQL
WITH `project.dataset.table` AS (
SELECT TIMESTAMP '2018-04-03T08:06:04' dt UNION ALL
SELECT '2018-04-03T08:07:27' UNION ALL
SELECT '2018-04-03T08:18:18' UNION ALL
SELECT '2018-04-03T10:08:27' UNION ALL
SELECT '2018-04-03T10:22:24' UNION ALL
SELECT '2018-04-03T12:08:50' UNION ALL
SELECT '2018-04-03T12:24:49' UNION ALL
SELECT '2018-04-03T12:24:51'
)
SELECT dt, 1 + SUM(start) OVER(ORDER BY dt) journey
FROM (
SELECT dt, IF(TIMESTAMP_DIFF(dt, LAG(dt) OVER(ORDER BY dt), MINUTE) > 40, 1, 0) start
FROM `project.dataset.table`
)
-- ORDER BY dt
有结果
Row dt journey
1 2018-04-03 08:06:04 UTC 1
2 2018-04-03 08:07:27 UTC 1
3 2018-04-03 08:18:18 UTC 1
4 2018-04-03 10:08:27 UTC 2
5 2018-04-03 10:22:24 UTC 2
6 2018-04-03 12:08:50 UTC 3
7 2018-04-03 12:24:49 UTC 3
8 2018-04-03 12:24:51 UTC 3
注意:我从问题的以下陈述中得出逻辑
我尝试增加+-40分钟的时间,所有进入该范围的交易将是一次独特的旅程
不太冗长的版本:
#standardSQL
SELECT dt, 1 + COUNTIF(start) OVER(ORDER BY dt) journey
FROM (
SELECT dt, (TIMESTAMP_DIFF(dt, LAG(dt) OVER(ORDER BY dt), MINUTE) > 40) start
FROM `project.dataset.table`
)
ORDER BY dt