计算自上次维护以来的日期差的有效方法是什么?

时间:2019-03-10 06:54:06

标签: python pandas

以下是我正在使用的示例数据集:

            maint id
datetime            
2015-01-01    1.0  a
2015-01-02    NaN  a
2015-01-03    NaN  a
2015-01-04    1.0  a
2015-01-05    NaN  a
2015-01-06    NaN  a
2015-01-07    NaN  a
2015-01-01    NaN  b
2015-01-02    NaN  b
2015-01-03    1.0  b
2015-01-04    1.0  b
2015-01-05    NaN  b
2015-01-06    NaN  b
2015-01-07    NaN  b

我想得到的是df['maint']为1以来的日差。

            maint id  days
datetime                  
2015-01-01    1.0  a     0
2015-01-02    NaN  a     1
2015-01-03    NaN  a     2
2015-01-04    1.0  a     0
2015-01-05    NaN  a     1
2015-01-06    NaN  a     2
2015-01-07    NaN  a     3
2015-01-01    NaN  b     0
2015-01-02    NaN  b     0
2015-01-03    1.0  b     0
2015-01-04    1.0  b     0
2015-01-05    NaN  b     1
2015-01-06    NaN  b     2
2015-01-07    NaN  b     3

因为我有成千上万个不同的ID,并且每个ID都有几年的维护记录。我想找到一种计算日差的有效方法。

1 个答案:

答案 0 :(得分:2)

使用:

df['days'] = df.index.where(df['maint'].eq(1))
df['days'] = (df.index - df.groupby('id')['days'].ffill()).fillna(pd.Timedelta(0)).dt.days
print (df)
            maint id  days
datetime                  
2015-01-01    1.0  a     0
2015-01-02    NaN  a     1
2015-01-03    NaN  a     2
2015-01-04    1.0  a     0
2015-01-05    NaN  a     1
2015-01-06    NaN  a     2
2015-01-07    NaN  a     3
2015-01-01    NaN  b     0
2015-01-02    NaN  b     0
2015-01-03    1.0  b     0
2015-01-04    1.0  b     0
2015-01-05    NaN  b     1
2015-01-06    NaN  b     2
2015-01-07    NaN  b     3

说明

  1. 首先使用值days创建新列df.index,其中maint1,另一个值是NaT
  2. GroupBy.ffill创建的新系列减去index,将NaN替换为0 timedelta,最后用Series.dt.days将它们转换为天数