在python中查找日期范围重叠并返回重叠

时间:2018-11-12 09:28:07

标签: python pandas datetime overlap

我正在研究与[here] [1]类似的问题 我有一个带有两个datetime列的数据框,我需要确定重叠部分。

import pandas as pd
from datetime import datetime
df = pd.DataFrame(columns=['id','from','to'], index=range(5), \
                  data=[[878,'2006-01-01','2007-10-01'],
                        [878,'2007-10-02','2008-12-01'],
                        [878,'2008-12-02','2010-04-03'],
                        [879,'2010-04-04','2199-05-11'],
                        [879,'2016-05-12','2199-12-31']])

df['from'] = pd.to_datetime(df['from'])
df['to'] = pd.to_datetime(df['to'])

以下内容非常有用,可以将重叠部分识别为二进制变量

df['overlap'] = (df.groupby('id')
                   .apply(lambda x: (x['to'].shift() - x['from']) > pd.Timedelta(seconds=0))
                   .reset_index(level=0, drop=True))

(正确返回):

[49]: 
    id       from         to  overlap
0  878 2006-01-01 2007-10-01    False
1  878 2007-10-02 2008-12-01    False
2  878 2008-12-02 2010-04-03    False
3  879 2010-04-04 2199-05-11    False
4  879 2016-05-12 2199-12-31     True

我现在想通过在出现重叠时保持重叠的开始和重叠的结束来扩展解决方案。 我试图让apply返回

中的pd.Series
df.groupby('id').apply(lambda x: 
pd.Series([x['to'].shift() - x['from'] > pd.Timedelta(seconds=0),
x['from'], 
x['to'].shift()],
index=['is_overlap','start_overlap','end_overlap']))

但是结果数据框为完全改变的形状(不再是5行)。 我只是想要

[49]: 
        id       from         to  is_overlap    start_overlap   end_overlap
    0  878 2006-01-01 2007-10-01    False    np.NaT       np.NaT
    1  878 2007-10-02 2008-12-01    False    np.NaT       np.NaT
    2  878 2008-12-02 2010-04-03    False    np.NaT       np.NaT
    3  879 2010-04-04 2199-05-11    False    np.NaT       np.NaT
    4  879 2016-05-12 2199-12-31     True    2016-05-12   2199-05-11

  [1]: https://stackoverflow.com/questions/42462218/find-date-range-overlap-in-python

0 个答案:

没有答案