我目前正在使用Postgres数据库,该数据库中的汽车跟踪数据类似于以下内容:
+----+--------+------------+----------+
| id | car_id | date | time |
+----+--------+------------+----------+
| 11 | 1 | 2014-12-20 | 12:12:12 |
| 12 | 1 | 2014-12-20 | 12:12:13 |
| 13 | 1 | 2014-12-20 | 12:12:14 |
| 23 | 1 | 2015-12-20 | 23:42:10 |
| 24 | 1 | 2015-12-20 | 23:42:11 |
| 31 | 2 | 2014-12-20 | 15:12:12 |
| 32 | 2 | 2014-12-20 | 15:12:14 |
+----+--------+------------+----------+
这是设置:
CREATE TABLE test (
id int
, car_id int
, date text
, time text
);
INSERT INTO test VALUES
(11, 1, '2014-12-20', '12:12:12'),
(12, 1, '2014-12-20', '12:12:13'),
(13, 1, '2014-12-20', '12:12:14'),
(23, 1, '2015-12-20', '23:42:10'),
(24, 1, '2015-12-20', '23:42:11'),
(31, 2, '2014-12-20', '15:12:12'),
(32, 2, '2014-12-20', '15:12:14');
我想创建一列,在其中为跟踪分配一个按id排序的行程编号
id car_id date time (trip)
11 1 2014-12-20 12:12:12 1
12 1 2014-12-20 12:12:13 1
13 1 2014-12-20 12:12:14 1
23 1 2015-12-20 23:42:10 2 (trip +1 because time difference is bigger then 5 sec)
24 1 2015-12-20 23:42:11 2
31 2 2014-12-20 15:12:12 3 (trip +1 because car id is different)
32 2 2014-12-20 15:12:14 3 `
我已遵循以下规则
第一行(最低ID)的值是trip = 1
用于以下行:如果car_id
等于上面的行和时间
该行与上面一行的差小于5,则跳闸为
与上面的行相同,否则行程为+1上面的行
我尝试了以下方法
Create table test as select
"id", "date", "time", car_id,
extract(epoch from "date" + "time") - lag(extract(epoch from "date" + "time")) over (order by "id") as diff,
Case
when t_diff < 5 and car_id - lag(car_id) over (order by "id") = 0
then lag(trip) over (order by "id")
else lag(trip) over (order by "id") + 1
end as trip
From road_1 order by "id"
但是它不起作用:(我如何计算trip
列?
答案 0 :(得分:0)
首先,使用(date || ' ' || time)::timestamp AS datetime
形成日期和时间之外的时间戳记
SELECT id, test.car_id
, (date || ' ' || time)::timestamp AS datetime
FROM test
产生
| id | car_id | datetime |
|----+--------+---------------------|
| 11 | 1 | 2014-12-20 12:12:12 |
| 12 | 1 | 2014-12-20 12:12:13 |
| 13 | 1 | 2014-12-20 12:12:14 |
| 23 | 1 | 2015-12-20 23:42:10 |
| 24 | 1 | 2015-12-20 23:42:11 |
| 31 | 2 | 2014-12-20 15:12:12 |
| 32 | 2 | 2014-12-20 15:12:14 |
这样做非常有帮助,因为我们将使用datetime - prev > '5 seconds'::interval
识别相隔5秒的行。注意
2014-12-20 23:59:59
和2014-12-21 00:00:00
相隔5秒
但是如果我们只有全部的date
和time
列来确定这一点将很困难/繁琐。
现在我们可以表达这样的规则:trip
在以下情况下增加1:
NOT ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval))
(下面详细说明了为什么以这种看似倒向的方式表示条件)。
SELECT id, car_id, prev_car_id, datetime, prev_date
, (CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END) AS new_trip
FROM (
SELECT id, car_id, datetime
, lag(datetime) OVER () AS prev_date
, lag(car_id) OVER () AS prev_car_id
FROM (
SELECT id, car_id
, (date || ' ' || time)::timestamp AS datetime
FROM test ) t1
) t2
收益
| id | car_id | prev_car_id | datetime | prev_date | new_trip |
|----+--------+-------------+---------------------+---------------------+----------|
| 11 | 1 | | 2014-12-20 12:12:12 | | 1 |
| 12 | 1 | 1 | 2014-12-20 12:12:13 | 2014-12-20 12:12:12 | 0 |
| 13 | 1 | 1 | 2014-12-20 12:12:14 | 2014-12-20 12:12:13 | 0 |
| 23 | 1 | 1 | 2015-12-20 23:42:10 | 2014-12-20 12:12:14 | 1 |
| 24 | 1 | 1 | 2015-12-20 23:42:11 | 2015-12-20 23:42:10 | 0 |
| 31 | 2 | 1 | 2014-12-20 15:12:12 | 2015-12-20 23:42:11 | 1 |
| 32 | 2 | 2 | 2014-12-20 15:12:14 | 2014-12-20 15:12:12 | 0 |
现在trip
可以表示为new_trip
列上的cumulative sum:
SELECT id, car_id, datetime, sum(new_trip) OVER (ORDER BY datetime) AS trip
FROM (
SELECT id, car_id, prev_car_id, datetime, prev_date
, (CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END) AS new_trip
FROM (
SELECT id, car_id, datetime
, lag(datetime) OVER () AS prev_date
, lag(car_id) OVER () AS prev_car_id
FROM (
SELECT id, car_id
, (date || ' ' || time)::timestamp AS datetime
FROM test ) t1
) t2
) t3
收益
| id | car_id | datetime | trip |
|----+--------+---------------------+------|
| 11 | 1 | 2014-12-20 12:12:12 | 1 |
| 12 | 1 | 2014-12-20 12:12:13 | 1 |
| 13 | 1 | 2014-12-20 12:12:14 | 1 |
| 31 | 2 | 2014-12-20 15:12:12 | 2 |
| 32 | 2 | 2014-12-20 15:12:14 | 2 |
| 23 | 1 | 2015-12-20 23:42:10 | 3 |
| 24 | 1 | 2015-12-20 23:42:11 | 3 |
我用过
(CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END)
代替
(CASE WHEN ((car_id != prev_car_id) OR (datetime-prev_date > '5 seconds'::interval)) THEN 1 ELSE 0 END)
因为prev_car_id
和prev_date
可能为NULL。因此,在第一行,(car_id != prev_car_id)
返回NULL,而相反,我们需要TRUE。
通过以相反的方式表达条件,我们可以正确地识别不交叉的行:
((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval))
,并在条件为TRUE或NULL时使用ELSE子句返回1。您可以在这里看到区别:
SELECT id
, (CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END) AS new_trip
, (CASE WHEN ((car_id != prev_car_id) OR (datetime-prev_date > '5 seconds'::interval)) THEN 1 ELSE 0 END) AS new_trip_wrong
, car_id, prev_car_id, datetime, prev_date
FROM (
SELECT id, car_id, datetime
, lag(datetime) OVER () AS prev_date
, lag(car_id) OVER () AS prev_car_id
FROM (
SELECT id, car_id
, (date || ' ' || time)::timestamp AS datetime
FROM test ) t1
) t2
收益
| id | new_trip | new_trip_wrong | car_id | prev_car_id | datetime | prev_date |
|----+----------+----------------+--------+-------------+---------------------+---------------------|
| 11 | 1 | 0 | 1 | | 2014-12-20 12:12:12 | |
| 12 | 0 | 0 | 1 | 1 | 2014-12-20 12:12:13 | 2014-12-20 12:12:12 |
| 13 | 0 | 0 | 1 | 1 | 2014-12-20 12:12:14 | 2014-12-20 12:12:13 |
| 23 | 1 | 1 | 1 | 1 | 2015-12-20 23:42:10 | 2014-12-20 12:12:14 |
| 24 | 0 | 0 | 1 | 1 | 2015-12-20 23:42:11 | 2015-12-20 23:42:10 |
| 31 | 1 | 1 | 2 | 1 | 2014-12-20 15:12:12 | 2015-12-20 23:42:11 |
| 32 | 0 | 0 | 2 | 2 | 2014-12-20 15:12:14 | 2014-12-20 15:12:12 |
请注意new_trip
和new_trip_wrong
列中的区别。