Coordinates time series to gantt chart like data

时间:2019-02-19 09:07:41

标签: python sql pandas time-series coordinates

I have a multiple objects similar to remote operated toy cars which send their location every minute or so. Within a certain polygon/area I can reiceive their location quite well, however because the coordinates jump around a bit and the receiver does not always have perfect reception you might miss a point here and there. This dataset looks something like this:

car_id | datetime             | x    | y
1      | 2018-01-01 00:00:01  | .... | .... -> 1 enters the polygon
1      | 2018-01-01 00:01:02  | .... | ....
2      | 2018-01-01 00:01:13  | .... | .... -> 2 enters the polygon
3      | 2018-01-01 00:01:40  | .... | .... -> 3 enters the polygon
2      | 2018-01-01 00:02:15  | .... | ....
3      | 2018-01-01 00:03:35  | .... | ....
1      | 2018-01-01 00:03:40  | .... | ....
2      | 2018-01-01 00:03:40  | .... | ....
3      | 2018-01-01 00:04:25  | .... | ....
3      | 2018-01-01 00:05:15  | .... | ....
2      | 2018-01-01 00:05:20  | .... | ....
2      | 2018-01-01 00:06:25  | .... | ....
2      | 2018-01-01 00:07:18  | .... | ....
2      | 2018-01-01 00:08:20  | .... | ....
2      | 2018-01-01 00:09:45  | .... | .... -> haven't seen 1 for a while so must have left the polygon 
2      | 2018-01-01 00:10:35  | .... | .... -> haven't seen 3 for a while so must have left the polygon 
2      | 2018-01-01 00:11:47  | .... | ....

I have left x and y to be "...." in this example, but they have a coordinate within the polygon. In order to create a "gantt chart" (doesn't actually have to be the chart, just the dataset the idea would be to see if an object (i.e. car hasn't been in the polygon for a while (say 5 minutes) and come to the conclusion that it has left the area (after that an object can re-enter an area but that will be a new entry into the polygon)

The result I would expect would be something like this:

car_id | entry_date_time     | exit_date_time
1      | 2018-01-01 00:00:01 | 2018-01-01 00:09:45
2      | 2018-01-01 00:01:13 | NULL
3      | 2018-01-01 00:01:40 | 2018-01-01 00:10:35

Does anyone have an idea how I could best go from the timeseries to the "gantt chart"? I'm trying to do this in either SQL (AWS Athena) or pandas (python).

Thanks!

0 个答案:

没有答案