如何在大熊猫groupby中使用条件

时间:2018-12-17 02:38:12

标签: python-3.x pandas pandas-groupby

我有以下行进路线Data,需要使用以下条件生成唯一身份。

伪代码: 用于dir,id1,id2的组合
 计算行之间的时间差。

设置计数器= 1

如果时差> 30分钟,则计数器+1否则计数器。

以下是我编写的代码

    #Input dataframe
data = pd.DataFrame({        
        "dir":["A_dir","A_dir","A_dir","A_dir","C_dir","C_dir","C_dir","H_dir","H_dir","H_dir","A_dir","A_dir","A_dir","A_dir"],
        "Timestamp":["13-12-2018 08:00:00","13-12-2018 08:03:00","13-12-2018 08:06:00","13-12-2018 08:09:00","13-12-2018 11:58:00","13-12-2018 12:00:00","13-12-2018 12:02:00","13-12-2018 12:05:00","13-12-2018 12:07:05","13-12-2018 12:10:00","13-12-2018 13:00:00","13-12-2018 13:10:00","13-12-2018 13:20:00","13-12-2018 13:32:00"],
        "time diff":["","00:03:00","00:03:00","00:03:00","03:49:00","00:02:00","00:02:00","00:03:00","00:02:05","00:02:55","00:50:00","00:10:00","00:10:00","00:12:00"],
        "des":["G","F","C","A","A","E","C","B","G","H","G","F","C","A"],
        "origin":["H","G","F","C","D","B","E","A","B","G","H","G","F","C"],"Journey":[1,1,1,1,2,2,2,3,3,3,4,4,4,4],
        "id2":[100,100,100,100,2,2,2,100,100,100,100,100,100,100],"id1":[1,1,1,1,2,2,2,1,1,1,1,1,1,1]
        })

#Identify time diff
for i, row in data.iterrows():
    data['time diff']=data['Timestamp'] - data['Timestamp'].shift(1)

#Unique id
data.index = pd.to_datetime(data["Timestamp"])
data['uniqueid'] = data.groupby([data['id1'],data['id2'],data['dir'],pd.TimeGrouper('30Min')]).ngroup()

这是output。 TimeGrouper将时间段分为30分钟时段,当连续行的时间戳小于30分钟但位于不同时段中时,当唯一性相同时,它会将uniqueid计算为2个单独的时段。例如:2的唯一标识实际上应该是1。

请提供有关如何生成唯一ID的建议。我也乐于接受其他方式来生成它。提前致谢。

0 个答案:

没有答案