我有以下行进路线Data,需要使用以下条件生成唯一身份。
伪代码:
用于dir,id1,id2的组合
计算行之间的时间差。
设置计数器= 1
如果时差> 30分钟,则计数器+1否则计数器。
以下是我编写的代码
#Input dataframe
data = pd.DataFrame({
"dir":["A_dir","A_dir","A_dir","A_dir","C_dir","C_dir","C_dir","H_dir","H_dir","H_dir","A_dir","A_dir","A_dir","A_dir"],
"Timestamp":["13-12-2018 08:00:00","13-12-2018 08:03:00","13-12-2018 08:06:00","13-12-2018 08:09:00","13-12-2018 11:58:00","13-12-2018 12:00:00","13-12-2018 12:02:00","13-12-2018 12:05:00","13-12-2018 12:07:05","13-12-2018 12:10:00","13-12-2018 13:00:00","13-12-2018 13:10:00","13-12-2018 13:20:00","13-12-2018 13:32:00"],
"time diff":["","00:03:00","00:03:00","00:03:00","03:49:00","00:02:00","00:02:00","00:03:00","00:02:05","00:02:55","00:50:00","00:10:00","00:10:00","00:12:00"],
"des":["G","F","C","A","A","E","C","B","G","H","G","F","C","A"],
"origin":["H","G","F","C","D","B","E","A","B","G","H","G","F","C"],"Journey":[1,1,1,1,2,2,2,3,3,3,4,4,4,4],
"id2":[100,100,100,100,2,2,2,100,100,100,100,100,100,100],"id1":[1,1,1,1,2,2,2,1,1,1,1,1,1,1]
})
#Identify time diff
for i, row in data.iterrows():
data['time diff']=data['Timestamp'] - data['Timestamp'].shift(1)
#Unique id
data.index = pd.to_datetime(data["Timestamp"])
data['uniqueid'] = data.groupby([data['id1'],data['id2'],data['dir'],pd.TimeGrouper('30Min')]).ngroup()
这是output。 TimeGrouper将时间段分为30分钟时段,当连续行的时间戳小于30分钟但位于不同时段中时,当唯一性相同时,它会将uniqueid计算为2个单独的时段。例如:2的唯一标识实际上应该是1。
请提供有关如何生成唯一ID的建议。我也乐于接受其他方式来生成它。提前致谢。