我迷失了试图寻找一种简单的方法来确定2个数据框的日期范围何时不重叠。
我有2个数据框:
df1 = pd.DataFrame({
'START':['2019-01-01 09:00:00', '2019-01-01 18:00:00'],
'END':['2019-01-01 16:00:00', '2019-01-02 02:00:00']})
df2 = pd.DataFrame({
'START':['2019-01-01 08:00:00', '2019-01-01 14:00:00', '2019-01-01 22:00:00', '2019-01-02 01:00:00'],
'END':['2019-01-01 11:00:00', '2019-01-01 15:00:00', '2019-01-01 23:00:00', '2019-01-02 04:00:00']})
df1 :
START END
0 2019-01-01 09:00:00 2019-01-01 16:00:00
1 2019-01-01 18:00:00 2019-01-02 02:00:00
df2 :
START END
0 2019-01-01 08:00:00 2019-01-01 11:00:00
1 2019-01-01 14:00:00 2019-01-01 15:00:00
2 2019-01-01 22:00:00 2019-01-01 23:00:00
3 2019-01-02 01:00:00 2019-01-02 04:00:00
并且我想获取df1与df2不重叠的日期范围:
START END
0 2019-01-01 11:00:00 2019-01-01 14:00:00
1 2019-01-01 15:00:00 2019-01-01 16:00:00
2 2019-01-01 18:00:00 2019-01-01 22:00:00
3 2019-01-01 23:00:00 2019-01-02 01:00:00
感谢您的帮助。
答案 0 :(得分:1)
last_record_at
可以使用sympy
,这是一个示例
Interval
此输出
from sympy import Interval
from dateutil import parser
from datetime import datetime
import pandas as pd
def remove_overlap(time_intervals, overlapping_intervals):
time_span = None
for _, interval in time_intervals.iterrows():
start_ts = parser.parse(interval["START"]).timestamp()
end_ts = parser.parse(interval["END"]).timestamp()
interval_ts = Interval(start_ts, end_ts)
time_span = interval_ts + time_span if time_span else interval_ts
for _, interval in overlapping_intervals.iterrows():
start_ts = parser.parse(interval["START"]).timestamp()
end_ts = parser.parse(interval["END"]).timestamp()
interval_ts = Interval(start_ts, end_ts)
time_span = time_span - interval_ts
bounds_ts = list(time_span.boundary)
bounds_dates = [datetime.fromtimestamp(t).strftime("%Y-%m-%d %H:%M:%S") for t in bounds_ts]
df = pd.DataFrame({"START": bounds_dates[0::2], "END":bounds_dates[1::2]})
return df
df1 = pd.DataFrame({
'START':['2019-01-01 09:00:00', '2019-01-01 18:00:00'],
'END':['2019-01-01 16:00:00', '2019-01-02 02:00:00']})
df2 = pd.DataFrame({
'START':['2019-01-01 08:00:00', '2019-01-01 14:00:00', '2019-01-01 22:00:00', '2019-01-02 01:00:00'],
'END':['2019-01-01 11:00:00', '2019-01-01 15:00:00', '2019-01-01 23:00:00', '2019-01-02 04:00:00']})
remove_overlap(df1,df2)