以下是我正在使用的示例数据集:
maint id
datetime
2015-01-01 1.0 a
2015-01-02 NaN a
2015-01-03 NaN a
2015-01-04 1.0 a
2015-01-05 NaN a
2015-01-06 NaN a
2015-01-07 NaN a
2015-01-01 NaN b
2015-01-02 NaN b
2015-01-03 1.0 b
2015-01-04 1.0 b
2015-01-05 NaN b
2015-01-06 NaN b
2015-01-07 NaN b
我想得到的是df['maint']
为1以来的日差。
maint id days
datetime
2015-01-01 1.0 a 0
2015-01-02 NaN a 1
2015-01-03 NaN a 2
2015-01-04 1.0 a 0
2015-01-05 NaN a 1
2015-01-06 NaN a 2
2015-01-07 NaN a 3
2015-01-01 NaN b 0
2015-01-02 NaN b 0
2015-01-03 1.0 b 0
2015-01-04 1.0 b 0
2015-01-05 NaN b 1
2015-01-06 NaN b 2
2015-01-07 NaN b 3
因为我有成千上万个不同的ID,并且每个ID都有几年的维护记录。我想找到一种计算日差的有效方法。
答案 0 :(得分:2)
使用:
df['days'] = df.index.where(df['maint'].eq(1))
df['days'] = (df.index - df.groupby('id')['days'].ffill()).fillna(pd.Timedelta(0)).dt.days
print (df)
maint id days
datetime
2015-01-01 1.0 a 0
2015-01-02 NaN a 1
2015-01-03 NaN a 2
2015-01-04 1.0 a 0
2015-01-05 NaN a 1
2015-01-06 NaN a 2
2015-01-07 NaN a 3
2015-01-01 NaN b 0
2015-01-02 NaN b 0
2015-01-03 1.0 b 0
2015-01-04 1.0 b 0
2015-01-05 NaN b 1
2015-01-06 NaN b 2
2015-01-07 NaN b 3
说明:
days
创建新列df.index
,其中maint
是1
,另一个值是NaT
GroupBy.ffill
创建的新系列减去index
,将NaN
替换为0 timedelta
,最后用Series.dt.days
将它们转换为天数