熊猫-日期范围不重叠

时间:2019-04-18 20:34:44

标签: python pandas

我迷失了试图寻找一种简单的方法来确定2个数据框的日期范围何时不重叠。

我有2个数据框:

df1 = pd.DataFrame({
    'START':['2019-01-01 09:00:00', '2019-01-01 18:00:00'],
    'END':['2019-01-01 16:00:00', '2019-01-02 02:00:00']})
df2 = pd.DataFrame({
    'START':['2019-01-01 08:00:00', '2019-01-01 14:00:00', '2019-01-01 22:00:00', '2019-01-02 01:00:00'],
    'END':['2019-01-01 11:00:00', '2019-01-01 15:00:00', '2019-01-01 23:00:00', '2019-01-02 04:00:00']})

df1 :
                 START                  END
0  2019-01-01 09:00:00  2019-01-01 16:00:00
1  2019-01-01 18:00:00  2019-01-02 02:00:00

df2 :
                 START                  END
0  2019-01-01 08:00:00  2019-01-01 11:00:00
1  2019-01-01 14:00:00  2019-01-01 15:00:00
2  2019-01-01 22:00:00  2019-01-01 23:00:00
3  2019-01-02 01:00:00  2019-01-02 04:00:00

并且我想获取df1与df2不重叠的日期范围:

                 START                  END
0  2019-01-01 11:00:00  2019-01-01 14:00:00
1  2019-01-01 15:00:00  2019-01-01 16:00:00
2  2019-01-01 18:00:00  2019-01-01 22:00:00
3  2019-01-01 23:00:00  2019-01-02 01:00:00

感谢您的帮助。

1 个答案:

答案 0 :(得分:1)

last_record_at可以使用sympy,这是一个示例

Interval

此输出

from sympy import Interval
from dateutil import parser
from datetime import datetime
import pandas as pd

def remove_overlap(time_intervals, overlapping_intervals):
    time_span = None

    for _, interval in time_intervals.iterrows():
        start_ts = parser.parse(interval["START"]).timestamp()
        end_ts = parser.parse(interval["END"]).timestamp()
        interval_ts = Interval(start_ts, end_ts)
        time_span = interval_ts + time_span if time_span else interval_ts

    for _, interval in overlapping_intervals.iterrows():
        start_ts = parser.parse(interval["START"]).timestamp()
        end_ts = parser.parse(interval["END"]).timestamp()
        interval_ts = Interval(start_ts, end_ts)
        time_span = time_span - interval_ts

    bounds_ts = list(time_span.boundary)
    bounds_dates = [datetime.fromtimestamp(t).strftime("%Y-%m-%d %H:%M:%S") for t in bounds_ts]

    df = pd.DataFrame({"START": bounds_dates[0::2], "END":bounds_dates[1::2]})
    return df

df1 = pd.DataFrame({
    'START':['2019-01-01 09:00:00', '2019-01-01 18:00:00'],
    'END':['2019-01-01 16:00:00', '2019-01-02 02:00:00']})
df2 = pd.DataFrame({
    'START':['2019-01-01 08:00:00', '2019-01-01 14:00:00', '2019-01-01 22:00:00', '2019-01-02 01:00:00'],
    'END':['2019-01-01 11:00:00', '2019-01-01 15:00:00', '2019-01-01 23:00:00', '2019-01-02 04:00:00']})

remove_overlap(df1,df2)