根据跟踪数据创建行程编号

时间:2019-02-19 14:59:14

标签: postgresql

我目前正在使用Postgres数据库,该数据库中的汽车跟踪数据类似于以下内容:

+----+--------+------------+----------+
| id | car_id |    date    |   time   |
+----+--------+------------+----------+
| 11 |      1 | 2014-12-20 | 12:12:12 |
| 12 |      1 | 2014-12-20 | 12:12:13 |
| 13 |      1 | 2014-12-20 | 12:12:14 |
| 23 |      1 | 2015-12-20 | 23:42:10 |
| 24 |      1 | 2015-12-20 | 23:42:11 |
| 31 |      2 | 2014-12-20 | 15:12:12 |
| 32 |      2 | 2014-12-20 | 15:12:14 |
+----+--------+------------+----------+

这是设置:

CREATE TABLE test (
    id int
    , car_id int
    , date text
    , time text
);
INSERT INTO test VALUES
    (11, 1, '2014-12-20', '12:12:12'),
    (12, 1, '2014-12-20', '12:12:13'),
    (13, 1, '2014-12-20', '12:12:14'),
    (23, 1, '2015-12-20', '23:42:10'),
    (24, 1, '2015-12-20', '23:42:11'),
    (31, 2, '2014-12-20', '15:12:12'),
    (32, 2, '2014-12-20', '15:12:14');

我想创建一列,在其中为跟踪分配一个按id排序的行程编号

id   car_id    date          time       (trip)
11   1         2014-12-20    12:12:12   1
12   1         2014-12-20    12:12:13   1
13   1         2014-12-20    12:12:14   1
23   1         2015-12-20    23:42:10   2   (trip +1 because time difference is bigger then 5 sec)
24   1         2015-12-20    23:42:11   2
31   2         2014-12-20    15:12:12   3   (trip +1 because car id is different)
32   2         2014-12-20    15:12:14   3          `

我已遵循以下规则

  • 第一行(最低ID)的值是trip = 1

  • 用于以下行:如果car_id等于上面的行和时间 该行与上面一行的差小于5,则跳闸为 与上面的行相同,否则行程为+1上面的行

我尝试了以下方法

Create table test as select
"id", "date", "time", car_id,

extract(epoch from "date" + "time") - lag(extract(epoch from "date" + "time")) over (order by "id") as diff,

Case
when t_diff < 5 and car_id - lag(car_id) over (order by "id") = 0
then lag(trip) over (order by "id")
else lag(trip) over (order by "id") + 1
end as trip

From road_1 order by "id"

但是它不起作用:(我如何计算trip列?

1 个答案:

答案 0 :(得分:0)

首先,使用(date || ' ' || time)::timestamp AS datetime形成日期和时间之外的时间戳记

SELECT id, test.car_id 
       , (date || ' ' || time)::timestamp AS datetime
FROM test 

产生

| id | car_id | datetime            |
|----+--------+---------------------|
| 11 |      1 | 2014-12-20 12:12:12 |
| 12 |      1 | 2014-12-20 12:12:13 |
| 13 |      1 | 2014-12-20 12:12:14 |
| 23 |      1 | 2015-12-20 23:42:10 |
| 24 |      1 | 2015-12-20 23:42:11 |
| 31 |      2 | 2014-12-20 15:12:12 |
| 32 |      2 | 2014-12-20 15:12:14 |

这样做非常有帮助,因为我们将使用datetime - prev > '5 seconds'::interval 识别相隔5秒的行。注意 2014-12-20 23:59:592014-12-21 00:00:00相隔5秒 但是如果我们只有全部的datetime列来确定这一点将很困难/繁琐。

现在我们可以表达这样的规则:trip在以下情况下增加1:

NOT ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval))

(下面详细说明了为什么以这种看似倒向的方式表示条件)。

SELECT id, car_id, prev_car_id, datetime, prev_date
    , (CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END) AS new_trip
FROM ( 
    SELECT id, car_id, datetime
        , lag(datetime) OVER () AS prev_date
        , lag(car_id) OVER () AS prev_car_id
    FROM (
        SELECT id, car_id 
               , (date || ' ' || time)::timestamp AS datetime
        FROM test ) t1
    ) t2

收益

| id | car_id | prev_car_id | datetime            | prev_date           | new_trip |
|----+--------+-------------+---------------------+---------------------+----------|
| 11 |      1 |             | 2014-12-20 12:12:12 |                     |        1 |
| 12 |      1 |           1 | 2014-12-20 12:12:13 | 2014-12-20 12:12:12 |        0 |
| 13 |      1 |           1 | 2014-12-20 12:12:14 | 2014-12-20 12:12:13 |        0 |
| 23 |      1 |           1 | 2015-12-20 23:42:10 | 2014-12-20 12:12:14 |        1 |
| 24 |      1 |           1 | 2015-12-20 23:42:11 | 2015-12-20 23:42:10 |        0 |
| 31 |      2 |           1 | 2014-12-20 15:12:12 | 2015-12-20 23:42:11 |        1 |
| 32 |      2 |           2 | 2014-12-20 15:12:14 | 2014-12-20 15:12:12 |        0 |

现在trip可以表示为new_trip列上的cumulative sum

SELECT id, car_id, datetime, sum(new_trip) OVER (ORDER BY datetime) AS trip
FROM (
    SELECT id, car_id, prev_car_id, datetime, prev_date
        , (CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END) AS new_trip
    FROM ( 
        SELECT id, car_id, datetime
            , lag(datetime) OVER () AS prev_date
            , lag(car_id) OVER () AS prev_car_id
        FROM (
            SELECT id, car_id 
                   , (date || ' ' || time)::timestamp AS datetime
            FROM test ) t1
        ) t2

    ) t3

收益

| id | car_id | datetime            | trip |
|----+--------+---------------------+------|
| 11 |      1 | 2014-12-20 12:12:12 |    1 |
| 12 |      1 | 2014-12-20 12:12:13 |    1 |
| 13 |      1 | 2014-12-20 12:12:14 |    1 |
| 31 |      2 | 2014-12-20 15:12:12 |    2 |
| 32 |      2 | 2014-12-20 15:12:14 |    2 |
| 23 |      1 | 2015-12-20 23:42:10 |    3 |
| 24 |      1 | 2015-12-20 23:42:11 |    3 |

我用过

(CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END)

代替

(CASE WHEN ((car_id != prev_car_id) OR (datetime-prev_date > '5 seconds'::interval)) THEN 1 ELSE 0 END)

因为prev_car_idprev_date可能为NULL。因此,在第一行,(car_id != prev_car_id)返回NULL,而相反,我们需要TRUE。 通过以相反的方式表达条件,我们可以正确地识别不交叉的行:

((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval))

,并在条件为TRUE或NULL时使用ELSE子句返回1。您可以在这里看到区别:

SELECT id  
    , (CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END) AS new_trip
    , (CASE WHEN ((car_id != prev_car_id) OR (datetime-prev_date > '5 seconds'::interval)) THEN 1 ELSE 0 END) AS new_trip_wrong
    , car_id, prev_car_id, datetime, prev_date
FROM ( 
    SELECT id, car_id, datetime
        , lag(datetime) OVER () AS prev_date
        , lag(car_id) OVER () AS prev_car_id
    FROM (
        SELECT id, car_id 
               , (date || ' ' || time)::timestamp AS datetime
        FROM test ) t1
    ) t2

收益

| id | new_trip | new_trip_wrong | car_id | prev_car_id | datetime            | prev_date           |
|----+----------+----------------+--------+-------------+---------------------+---------------------|
| 11 |        1 |              0 |      1 |             | 2014-12-20 12:12:12 |                     |
| 12 |        0 |              0 |      1 |           1 | 2014-12-20 12:12:13 | 2014-12-20 12:12:12 |
| 13 |        0 |              0 |      1 |           1 | 2014-12-20 12:12:14 | 2014-12-20 12:12:13 |
| 23 |        1 |              1 |      1 |           1 | 2015-12-20 23:42:10 | 2014-12-20 12:12:14 |
| 24 |        0 |              0 |      1 |           1 | 2015-12-20 23:42:11 | 2015-12-20 23:42:10 |
| 31 |        1 |              1 |      2 |           1 | 2014-12-20 15:12:12 | 2015-12-20 23:42:11 |
| 32 |        0 |              0 |      2 |           2 | 2014-12-20 15:12:14 | 2014-12-20 15:12:12 |

请注意new_tripnew_trip_wrong列中的区别。