如何仅计算组中第一个值的差异?

时间:2019-06-18 17:07:42

标签: python pandas

我有一个熊猫数据框

ticket num loadtype   start                   diff                       end                                        
2       1   FIRST   12/28/18 7:40 PM    0 days 06:05:48.928732000   12/29/18 1:45 AM
2       2   MIDDLE  12/28/18 7:40 PM    0 days 09:21:17.652138000   12/29/18 5:01 AM
2       3   LAST    12/28/18 7:40 PM    0 days 13:11:39.585263000   12/29/18 8:51 AM
4 .     4 . FIRST   12/29/18 7:00 AM .  1 days 00:00:00.000000000   12/30/18 7:00 AM

数据帧按“票”分组,票有多个负载。我只想为每张票证的第一个负载计算结束与开始之间的差,并在该票证的所有其余负载中保持这一差。

所以我想要

ticket num loadtype   start                   diff                       end                                        
2      1    FIRST   12/28/18 7:40 PM    0 days 06:05:48.928732000   12/29/18 1:45 AM
2      2    MIDDLE  12/28/18 7:40 PM    0 days 06:05:48.928732000   12/29/18 5:01 AM
2      3    LAST    12/28/18 7:40 PM    0 days 06:05:48.928732000   12/29/18 8:51 AM
4 .    4 .  FIRST   12/29/18 7:00 AM .  1 days 00:00:00.000000000   12/30/18 7:00 AM 

我如何在熊猫中做到这一点?我必须先使用groupby,然后再进行某种应用吗?

1 个答案:

答案 0 :(得分:1)

仅当您的ticket组以给定的顺序显示时,此方法才有效

# preset with na Time
df['diff'] = pd.NaT

# update the FIRST loadtype
df.loc[df.loadtype.eq('FIRST'),'diff'] = (df['end'] - df['start'])

# ffill
df['diff'] = pd.to_timedelta(df['diff'].ffill())

输出:

   num loadtype               start     diff                 end
0    1    FIRST 2018-12-28 19:40:00 06:05:00 2018-12-29 01:45:00
1    2   MIDDLE 2018-12-28 19:40:00 06:05:00 2018-12-29 05:01:00
2    3     LAST 2018-12-28 19:40:00 06:05:00 2018-12-29 08:51:00