I have a multiple objects similar to remote operated toy cars which send their location every minute or so. Within a certain polygon/area I can reiceive their location quite well, however because the coordinates jump around a bit and the receiver does not always have perfect reception you might miss a point here and there. This dataset looks something like this:
car_id | datetime | x | y
1 | 2018-01-01 00:00:01 | .... | .... -> 1 enters the polygon
1 | 2018-01-01 00:01:02 | .... | ....
2 | 2018-01-01 00:01:13 | .... | .... -> 2 enters the polygon
3 | 2018-01-01 00:01:40 | .... | .... -> 3 enters the polygon
2 | 2018-01-01 00:02:15 | .... | ....
3 | 2018-01-01 00:03:35 | .... | ....
1 | 2018-01-01 00:03:40 | .... | ....
2 | 2018-01-01 00:03:40 | .... | ....
3 | 2018-01-01 00:04:25 | .... | ....
3 | 2018-01-01 00:05:15 | .... | ....
2 | 2018-01-01 00:05:20 | .... | ....
2 | 2018-01-01 00:06:25 | .... | ....
2 | 2018-01-01 00:07:18 | .... | ....
2 | 2018-01-01 00:08:20 | .... | ....
2 | 2018-01-01 00:09:45 | .... | .... -> haven't seen 1 for a while so must have left the polygon
2 | 2018-01-01 00:10:35 | .... | .... -> haven't seen 3 for a while so must have left the polygon
2 | 2018-01-01 00:11:47 | .... | ....
I have left x and y to be "...." in this example, but they have a coordinate within the polygon. In order to create a "gantt chart" (doesn't actually have to be the chart, just the dataset the idea would be to see if an object (i.e. car hasn't been in the polygon for a while (say 5 minutes) and come to the conclusion that it has left the area (after that an object can re-enter an area but that will be a new entry into the polygon)
The result I would expect would be something like this:
car_id | entry_date_time | exit_date_time
1 | 2018-01-01 00:00:01 | 2018-01-01 00:09:45
2 | 2018-01-01 00:01:13 | NULL
3 | 2018-01-01 00:01:40 | 2018-01-01 00:10:35
Does anyone have an idea how I could best go from the timeseries to the "gantt chart"? I'm trying to do this in either SQL (AWS Athena) or pandas (python).
Thanks!