我需要比较两个数据帧。一个记录事件和其他故障。我必须标记一系列失败的事件。我举个例子:
df1(事件)
EventID arrivalTime
3949362 22/12/2015 3:29
3948289 22/12/2015 3:29
3947252 22/12/2015 3:29
3951196 22/12/2015 3:29
3949908 22/12/2015 3:30
3948820 22/12/2015 3:30
3946194 22/12/2015 3:31
3949364 22/12/2015 3:31
3948292 22/12/2015 3:31
3947774 22/12/2015 3:31
3946736 22/12/2015 3:31
3947254 22/12/2015 3:32
3949366 22/12/2015 3:32
3948294 22/12/2015 3:32
3946196 22/12/2015 3:32
3948824 22/12/2015 3:33
3949909 22/12/2015 3:33
3951200 22/12/2015 3:33
3947255 22/12/2015 3:33
3949368 22/12/2015 3:34
3946198 22/12/2015 3:34
df2(失败)
failures initial end
1 22/12/2015 3:31 22/12/2015 3:33
我想得到以下结果:
EventID arrivalTime interval
3949362 22/12/2015 3:29 0
3948289 22/12/2015 3:29 0
3947252 22/12/2015 3:29 0
3951196 22/12/2015 3:29 0
3949908 22/12/2015 3:30 0
3948820 22/12/2015 3:30 0
3946194 22/12/2015 3:31 1
3949364 22/12/2015 3:31 1
3948292 22/12/2015 3:31 1
3947774 22/12/2015 3:31 1
3946736 22/12/2015 3:31 1
3947254 22/12/2015 3:32 1
3949366 22/12/2015 3:32 1
3948294 22/12/2015 3:32 1
3946196 22/12/2015 3:32 1
3948824 22/12/2015 3:33 1
3949909 22/12/2015 3:33 1
3951200 22/12/2015 3:33 1
3947255 22/12/2015 3:33 0
3949368 22/12/2015 3:34 0
3946198 22/12/2015 3:34 0
目前我已经进行了两次嵌套,但我想更高效地完成它。
提前致谢
答案 0 :(得分:1)
答案 1 :(得分:0)
我在矢量化之前使用过的这个函数。
def periodofailure(df, failure):
l1 = []
if failure == 'no':
l1 = ['no' for n in range(len(df))]
df1 = pd.DataFrame(l1)
frames = [df, df1]
df = pd.concat(frames, axis=1)
df.rename(columns={0:'Incidencia'}, inplace=True)
return df
elif failure == 'si':
# here I manually put failures
IFfailure1 = str('2015-12-02 06:30:00')
EFfailure1 = str('2015-12-02 06:42:00')
IFfailure2 = str('2015-12-10 18:53:00')
EFfailure2 = str('2015-12-10 19:05:00')
IFfailure3 = str('2015-12-11 21:18:00')
EFfailure3 = str('2015-12-12 00:09:00')
IFfailure4 = str('2015-12-15 11:45:00')
EFfailure4 = str('2015-12-15 12:17:00')
IFfailure5 = str('2015-11-18 12:28:00')
EFfailure5 = str('2015-11-18 12:59:00')
for index, row in df.iterrows():
if IFfailure1 <= df['arrivalTime'][index] <= EFfailure1:
l1.append('si')
elif IFfailure2 <= df['arrivalTime'][index] <= EFfailure2:
l1.append('si')
elif IFfailure3 <= df['arrivalTime'][index] <= EFfailure3:
l1.append('si')
elif IFfailure4 <= df['arrivalTime'][index] <= EFfailure4:
l1.append('si')
elif IFfailure5 <= df['arrivalTime'][index] <= EFfailure5:
l1.append('si')
else:
l1.append('no')
df1 = pd.DataFrame(l1)
frames = [df, df1]
df = pd.concat(frames, axis=1)
df.rename(columns={0:'Incidencia'}, inplace=True)
return df