我正在尝试根据GPS时间戳为每个GPS数据包分配各自的计划编号和行程编号。既然我有来自各种设备的近一百万个GPS数据包,该如何有效地做到这一点?
我没有找到任何最佳方法。现在,我在所有行上循环运行,并将其时间戳与计划中的所有间隔进行比较,不发送表,并将匹配的计划号附加到每个GPS数据包中。
GPS数据框:
import pandas as pd
gps_df = pd.DataFrame({'Device':[1,1,2,2,3,3,3],'time-stamp': ['6:00:00','7:00:30','12:12:12','13:13:13','20:15:10','22:16:10','22:18:23']})
计划数据框:\ n
schedule_df = pd.DataFrame({'Device' :[1, 1, 1, 1, 2, 2, 2, 3,3, 3],
'schedule' :['A1','A1','A2','A2','B1','B2','B2','C1','C2','C3'],
'route no' :[1, 2, 1, 2, 1, 5, 6, 1, 1, 2],
'start time' : ['6:00:00','7:00:01','8:30:00','10:00:00','12:00:00','14:00:00','16:00:00','20:00:00','21:00:00','22:00:00'],
'end time' :['7:00:00','8:30:00','9:30:00','12:00:00','13:00:00','16:00:00','20:00:00','21:00:00','22:00:00','23:00:00']})
我想要这样的输出:
gps_df = pd.DataFrame({'Device':[1,1,2,2,3,3,3],
'time-stamp':['6:00:00','7:00:30','12:12:12','13:13:13','20:15:10','22:16:10','22:18:23'],
'schedule': ['A1','A1','B1','Na','C1','C3','C3'],
'route': [1, 2, 1, 'Na',1, 2, 2]})
答案 0 :(得分:0)
尝试一下: 将熊猫作为pd导入
gps_df = pd.DataFrame({'Device':[1,1,2,2,3,3,3],'time-stamp': ['6:00:00','7:00:30','12:12:12','13:13:13','20:15:10','22:16:10','22:18:23']})
schedule_df = pd.DataFrame({'Device' :[1, 1, 1, 1, 2, 2, 2, 3,3, 3],
'schedule' :['A1','A1','A2','A2','B1','B2','B2','C1','C2','C3'],
'route no' :[1, 2, 1, 2, 1, 5, 6, 1, 1, 2],
'start time' : ['6:00:00','7:00:01','8:30:00','10:00:00','12:00:00','14:00:00','16:00:00','20:00:00','21:00:00','22:00:00'],
'end time' :['7:00:00','8:30:00','9:30:00','12:00:00','13:00:00','16:00:00','20:00:00','21:00:00','22:00:00','23:00:00']})
print(gps_df)
print(schedule_df)
gps_df = pd.concat([gps_df, schedule_df],sort=True)
gps_df = gps_df.drop('end time', axis=1)
print(gps_df)
输出
Device time-stamp
0 1 6:00:00
1 1 7:00:30
2 2 12:12:12
3 2 13:13:13
4 3 20:15:10
5 3 22:16:10
6 3 22:18:23
Device schedule route no start time end time
0 1 A1 1 6:00:00 7:00:00
1 1 A1 2 7:00:01 8:30:00
2 1 A2 1 8:30:00 9:30:00
3 1 A2 2 10:00:00 12:00:00
4 2 B1 1 12:00:00 13:00:00
5 2 B2 5 14:00:00 16:00:00
6 2 B2 6 16:00:00 20:00:00
7 3 C1 1 20:00:00 21:00:00
8 3 C2 1 21:00:00 22:00:00
9 3 C3 2 22:00:00 23:00:00
Device time-stamp schedule route
0 1 6:00:00 A1 1
1 1 7:00:30 A1 2
2 2 12:12:12 B1 1
3 2 13:13:13 Na Na
4 3 20:15:10 C1 1
5 3 22:16:10 C3 2
6 3 22:18:23 C3 2
希望这会有所帮助
答案 1 :(得分:0)
使用merge
:
{"messages": [{"to":"+123","hsm":{"template": "demo","parameters":{"1": "12-12-2018"}}}]}
或者:
cols = ['Device', 'schedule', 'route','timestamp']
df = df2.merge(df1, on='Device')
df = df.loc[df.timestamp.lt(df.end_time) & df.timestamp.gt(df.start_time), cols]\
.set_index(['timestamp','Device'])\
.reindex(index=df1.set_index(['timestamp','Device']).index)\
.reset_index()
print(df)
timestamp Device schedule route
0 06:00:01 1 A1 1.0
1 07:00:30 1 A1 2.0
2 12:12:12 2 B1 1.0
3 13:13:13 2 NaN NaN
4 20:15:10 3 C1 1.0
5 22:16:10 3 C3 2.0
6 22:18:23 3 C3 2.0
答案 2 :(得分:0)
您可以尝试使用numpy数组。我已经省略了一些代码来初始化要添加到GPS数据帧中的其他输出列,但是尽管如此,我们的想法是创建一个2-D数组,其中AND逻辑的交集会生成一个真值表,该真值表可按设备ID映射匹配项时间范围内的“ i”是GPS df中的对应行索引,“ j”是Schedule df中的对应行索引。
gpsd = GPS_df.Device.values
schedd = Sched_df.Device.values
gpst = GPS_df.timestamp.values
tl = Sched_df.start_time.values
th = Sched_df.end_time.values
i, j = np.where((gpsd[None].T == schedd) &
(gpst[None].T >= tl ) &
(gpst[None].T <= th))
GPS_df.loc[i,'schedule'] = Sched_df.loc[j,'schedule']
GPS_df.loc[i,'route'] = Sched_df.loc[j,'route']