Python - 从数据框中过滤行

时间:2016-05-30 08:35:28

标签: python pandas dataframe

我有一个简单的数据框:

ID  Stime       Etime
1   13:00:00    13:15:00
1   14:00:00    14:15:00
2   15:00:00    15:42:00
3   13:00:00    13:25:00
4   15:00:00    15:15:00
4   15:05:00    15:15:00

我想要做的是对最后两行进行单位处理,因为它们属于相同的ID(ID=4),最后一行的时间包含在倒数第二行的时间内。

我想要的输出是:

ID  Stime       Etime
1   13:00:00    13:15:00
1   14:00:00    14:15:00
2   15:00:00    15:42:00
3   13:00:00    13:25:00
4   15:00:00    15:15:00

1 个答案:

答案 0 :(得分:1)

解决方案

def setup(df):
    td = df.Stime - df.Etime.shift()
    td = td.apply(lambda x: x.total_seconds() > 1)
    td.iloc[0] = True
    return td.cumsum()

def collapse(df):
    df_ = df.iloc[0, :]
    df_.loc['Stime'] = df.Stime.min()
    df_.loc['Etime'] = df.Etime.max()
    return df_

df['group id'] = df.groupby('ID').apply(setup).values

gbcols = ['ID', 'group id']
fcols = ['ID', 'Stime', 'Etime']

print df.groupby(gbcols)[fcols].apply(collapse).reset_index(drop=True)

   ID               Stime               Etime
0   1 2016-05-30 13:00:00 2016-05-30 13:15:00
1   1 2016-05-30 14:00:00 2016-05-30 14:15:00
2   2 2016-05-30 15:00:00 2016-05-30 15:42:00
3   3 2016-05-30 13:00:00 2016-05-30 13:25:00
4   4 2016-05-30 15:00:00 2016-05-30 15:15:00