我有一个熊猫数据框
ticket num loadtype start diff end
2 1 FIRST 12/28/18 7:40 PM 0 days 06:05:48.928732000 12/29/18 1:45 AM
2 2 MIDDLE 12/28/18 7:40 PM 0 days 09:21:17.652138000 12/29/18 5:01 AM
2 3 LAST 12/28/18 7:40 PM 0 days 13:11:39.585263000 12/29/18 8:51 AM
4 . 4 . FIRST 12/29/18 7:00 AM . 1 days 00:00:00.000000000 12/30/18 7:00 AM
数据帧按“票”分组,票有多个负载。我只想为每张票证的第一个负载计算结束与开始之间的差,并在该票证的所有其余负载中保持这一差。
所以我想要
ticket num loadtype start diff end
2 1 FIRST 12/28/18 7:40 PM 0 days 06:05:48.928732000 12/29/18 1:45 AM
2 2 MIDDLE 12/28/18 7:40 PM 0 days 06:05:48.928732000 12/29/18 5:01 AM
2 3 LAST 12/28/18 7:40 PM 0 days 06:05:48.928732000 12/29/18 8:51 AM
4 . 4 . FIRST 12/29/18 7:00 AM . 1 days 00:00:00.000000000 12/30/18 7:00 AM
我如何在熊猫中做到这一点?我必须先使用groupby,然后再进行某种应用吗?
答案 0 :(得分:1)
仅当您的ticket
组以给定的顺序显示时,此方法才有效
# preset with na Time
df['diff'] = pd.NaT
# update the FIRST loadtype
df.loc[df.loadtype.eq('FIRST'),'diff'] = (df['end'] - df['start'])
# ffill
df['diff'] = pd.to_timedelta(df['diff'].ffill())
输出:
num loadtype start diff end
0 1 FIRST 2018-12-28 19:40:00 06:05:00 2018-12-29 01:45:00
1 2 MIDDLE 2018-12-28 19:40:00 06:05:00 2018-12-29 05:01:00
2 3 LAST 2018-12-28 19:40:00 06:05:00 2018-12-29 08:51:00