熊猫时间范围重叠问题

时间:2021-01-21 00:22:58

标签: python pandas

我对 Pandas 数据框有疑问。

下面是我的数据框

                ELEMENT                                    TEXT         ID               START                 END  newid
 OLT2227-LT3-PON0-ONT03           USECASE1 - ALARM1 -NO OVERLAP  772874243 2021-01-19 18:00:00 2021-01-19 19:00:00      0
 OLT2227-LT3-PON0-ONT03          USECASE1 - ALARM2 - NO OVERLAP  772874243 2021-01-19 19:10:00 2021-01-19 20:00:12      1
 OLT2227-LT3-PON0-ONT05     USECASE2 - ALARM1 - Fully Contained  772874243 2021-01-19 18:00:00 2021-01-19 23:00:00      1
 OLT2227-LT3-PON0-ONT05     USECASE2 - ALARM2 - Fully Contained  772874243 2021-01-19 19:00:00 2021-01-19 20:00:12      1
 OLT2227-LT3-PON0-ONT10  USECASE3 - ALARM1 - START-END-RELATION  772874243 2021-01-19 22:00:00 2021-01-19 22:30:00      2
 OLT2227-LT3-PON0-ONT10  USECASE3 - ALARM2 - START-END-RELATION  772874243 2021-01-19 22:30:00 2021-01-19 23:00:12      2
 OLT2227-LT3-PON0-ONT21                         USECASE3-ALARM1  772874243 2021-01-19 22:00:00 2021-01-19 22:10:00      2
 OLT2227-LT3-PON0-ONT21                  USECASE3-ALARM2-NO-END  772874243 2021-01-19 22:15:00                 NaT      3
  OLT2227-LT3-PON0-ONT4                               USECASE-4  772874243 2021-01-19 17:30:00                 NaT      3
  OLT2227-LT3-PON0-ONT4                               USECASE-4  772874243 2021-01-19 20:00:00 2021-01-19 23:00:00      3
 OLT2227-LT3-PON0-ONT99                               USECASE-5  772874243 2021-01-19 17:30:00 2021-01-19 22:00:00      3
 OLT2227-LT3-PON0-ONT99                               USECASE-5  772874243 2021-01-19 20:00:00                 NaT      3

我目前得到的输出是:

                ELEMENT               START                 END
 OLT2227-LT3-PON0-ONT03 2021-01-19 18:00:00 2021-01-19 19:00:00
 OLT2227-LT3-PON0-ONT03 2021-01-19 19:10:00 2021-01-19 20:00:12
 OLT2227-LT3-PON0-ONT05 2021-01-19 18:00:00 2021-01-19 23:00:00
 OLT2227-LT3-PON0-ONT10 2021-01-19 22:00:00 2021-01-19 23:00:12
 OLT2227-LT3-PON0-ONT21 2021-01-19 22:00:00 2021-01-19 22:10:00
 OLT2227-LT3-PON0-ONT21 2021-01-19 22:15:00                 NaT
  OLT2227-LT3-PON0-ONT4 2021-01-19 17:30:00 2021-01-19 23:00:00
 OLT2227-LT3-PON0-ONT99 2021-01-19 17:30:00 2021-01-19 22:00:00

它适用于除用例 4 和 5 之外的所有用例,时间范围有重叠。我需要结束时间为“NaT”而不是以下内容:

  OLT2227-LT3-PON0-ONT4 2021-01-19 17:30:00 2021-01-19 23:00:00
 OLT2227-LT3-PON0-ONT99 2021-01-19 17:30:00 2021-01-19 22:00:00

由于时间重叠,我希望它取开始和最大值范围的最小值(在本例中为 NaT)。所以预期的结果是:

  OLT2227-LT3-PON0-ONT4 2021-01-19 17:30:00 NaT
  OLT2227-LT3-PON0-ONT99 2021-01-19 17:30:00 NaT

所有用例的最终预期结果是:

                ELEMENT               START                 END
 OLT2227-LT3-PON0-ONT03 2021-01-19 18:00:00 2021-01-19 19:00:00
 OLT2227-LT3-PON0-ONT03 2021-01-19 19:10:00 2021-01-19 20:00:12
 OLT2227-LT3-PON0-ONT05 2021-01-19 18:00:00 2021-01-19 23:00:00
 OLT2227-LT3-PON0-ONT10 2021-01-19 22:00:00 2021-01-19 23:00:12
 OLT2227-LT3-PON0-ONT21 2021-01-19 22:00:00 2021-01-19 22:10:00
 OLT2227-LT3-PON0-ONT21 2021-01-19 22:15:00                 NaT
 OLT2227-LT3-PON0-ONT4  2021-01-19 17:30:00                 NaT
 OLT2227-LT3-PON0-ONT99 2021-01-19 17:30:00                 NaT

以下是我使用过的代码:

df['newid']=(df['START']-df['END'].shift()).dt.total_seconds().gt(0).cumsum()
print (df.to_string(index=False))
newdf=df.groupby(['newid','ELEMENT']).agg({'START':'min','END':'max'}).reset_index(level=1)
print (newdf.to_string(index=False))

谁能给我一些光来实现我的目标? 谢谢

1 个答案:

答案 0 :(得分:0)

我找到了答案。 我将 NaT 转换为未来日期,这是使逻辑正常工作的一种变通方法。