按日期时间间隔比较两个数据帧(python pandas)

时间:2016-01-28 08:14:21

标签: python datetime join pandas merge

我需要比较两个数据帧。一个记录事件和其他故障。我必须标记一系列失败的事件。我举个例子:

df1(事件)

EventID arrivalTime 
3949362 22/12/2015 3:29 
3948289 22/12/2015 3:29 
3947252 22/12/2015 3:29 
3951196 22/12/2015 3:29 
3949908 22/12/2015 3:30 
3948820 22/12/2015 3:30 
3946194 22/12/2015 3:31 
3949364 22/12/2015 3:31 
3948292 22/12/2015 3:31 
3947774 22/12/2015 3:31 
3946736 22/12/2015 3:31 
3947254 22/12/2015 3:32 
3949366 22/12/2015 3:32 
3948294 22/12/2015 3:32 
3946196 22/12/2015 3:32 
3948824 22/12/2015 3:33 
3949909 22/12/2015 3:33 
3951200 22/12/2015 3:33 
3947255 22/12/2015 3:33 
3949368 22/12/2015 3:34 
3946198 22/12/2015 3:34 

df2(失败)

failures initial  end
1 22/12/2015 3:31 22/12/2015 3:33

我想得到以下结果:

EventID arrivalTime interval
3949362 22/12/2015 3:29 0
3948289 22/12/2015 3:29 0
3947252 22/12/2015 3:29 0
3951196 22/12/2015 3:29 0
3949908 22/12/2015 3:30 0
3948820 22/12/2015 3:30 0
3946194 22/12/2015 3:31 1
3949364 22/12/2015 3:31 1
3948292 22/12/2015 3:31 1
3947774 22/12/2015 3:31 1
3946736 22/12/2015 3:31 1
3947254 22/12/2015 3:32 1
3949366 22/12/2015 3:32 1
3948294 22/12/2015 3:32 1
3946196 22/12/2015 3:32 1
3948824 22/12/2015 3:33 1
3949909 22/12/2015 3:33 1
3951200 22/12/2015 3:33 1
3947255 22/12/2015 3:33 0
3949368 22/12/2015 3:34 0
3946198 22/12/2015 3:34 0

目前我已经进行了两次嵌套,但我想更高效地完成它。

提前致谢

2 个答案:

答案 0 :(得分:1)

您可以使用stack将更改行添加到列,然后在resamplenumpy.where之间添加isin和{{3}}之间的缺失数据{{3 }}:

initial
end

答案 1 :(得分:0)

我在矢量化之前使用过的这个函数。

def periodofailure(df, failure):

    l1 = []

    if failure == 'no':
        l1 = ['no' for n in range(len(df))]
        df1 = pd.DataFrame(l1)
        frames = [df, df1]
        df = pd.concat(frames, axis=1)
        df.rename(columns={0:'Incidencia'}, inplace=True)
        return df

    elif failure == 'si':
        # here I manually put failures
        IFfailure1 = str('2015-12-02 06:30:00')
        EFfailure1 = str('2015-12-02 06:42:00')
        IFfailure2 = str('2015-12-10 18:53:00')
        EFfailure2 = str('2015-12-10 19:05:00')
        IFfailure3 = str('2015-12-11 21:18:00')
        EFfailure3 = str('2015-12-12 00:09:00')
        IFfailure4 = str('2015-12-15 11:45:00')
        EFfailure4 = str('2015-12-15 12:17:00')
        IFfailure5 = str('2015-11-18 12:28:00')
        EFfailure5 = str('2015-11-18 12:59:00')

        for index, row in df.iterrows():

            if IFfailure1 <= df['arrivalTime'][index] <= EFfailure1:
                l1.append('si')
            elif IFfailure2 <= df['arrivalTime'][index] <= EFfailure2:
                l1.append('si')
            elif IFfailure3 <= df['arrivalTime'][index] <= EFfailure3:
                l1.append('si')
            elif IFfailure4 <= df['arrivalTime'][index] <= EFfailure4:
                l1.append('si')
            elif IFfailure5 <= df['arrivalTime'][index] <= EFfailure5:
                l1.append('si')
            else:
                l1.append('no')

        df1 = pd.DataFrame(l1)
        frames = [df, df1]
        df = pd.concat(frames, axis=1)
        df.rename(columns={0:'Incidencia'}, inplace=True)
        return df