在熊猫数据框中计算日期范围之间的重叠

时间:2020-05-18 15:55:46

标签: python pandas datetime

我正在尝试查找两个日期范围之间以分钟为单位的重叠时间。为了避免for循环,我选择在确定时不使用min / max。我开发了此功能,以查找任何EventA和EventB之间可能有多个重叠的重叠分钟总数,我相信这是可行的,但是我想联系社区进行分析,因为这非常不合常规。是否在任何情况下都无法使用此方法,或者我应该谨慎使用它?非常感谢您的反馈(stackoverflow新手,请随时告诉我是否也应以其他方式格式化请求)。

import pandas as pd

df_EventA= pd.DataFrame()
df_EventB= pd.DataFrame()

df_EventA['EventAStart'] = pd.Series(pd.to_datetime(['20200101 9:30','20200101 10:30:00', '20200101 11:30:00', '20200101 12:30:00', '20200101 13:30:00', '20200101 14:30:00','20200101 15:30:00','20200101 16:30:00']))
df_EventA['EventAEnd'] = pd.Series(pd.to_datetime(['20200101 10:00','20200101 11:00:00', '20200101 12:00:00', '20200101 13:00:00', '20200101 14:00:00','20200101 15:00:00','20200101 16:00:00','20200101 17:00:00']))
df_EventB['EventBStart'] = pd.Series(pd.to_datetime(['20200101 9:45','20200101 10:45:00', '20200101 11:45:00', '20200101 12:45:00', '20200101 13:45:00', '20200101 14:45:00','20200101 15:45:00','20200101 16:45:00']))
df_EventB['EventBEnd'] = pd.Series(pd.to_datetime(['20200101 10:00','20200101 11:00:00', '20200101 12:00:00', '20200101 13:00:00', '20200101 14:00:00','20200101 15:00:00','20200101 16:00:00','20200101 17:00:00']))

df_EventA['EventATotal'] = (df_EventA['EventAEnd']-df_EventA['EventAStart']).dt.total_seconds() / 60
df_EventB['EventBTotal'] = (df_EventB['EventBEnd']-df_EventB['EventBStart']).dt.total_seconds() / 60
df_EventA['overlap_minutes']=0

def overlap(x, df_EventB):
    total_minutes = x['EventATotal']+df_EventB['EventBTotal']
    start_diff = (df_EventB['EventBStart']-x['EventAStart']).dt.total_seconds() / 60
    end_diff = (df_EventB['EventBEnd']-x['EventAEnd']).dt.total_seconds() / 60
    breaks=(total_minutes-abs(start_diff)-abs(end_diff))/2
    x['overlap_minutes']= breaks[breaks>0].sum()
    return x

df_EventA = df_EventA.apply(overlap,axis=1, args=(df_EventB,))

0 个答案:

没有答案