Pandas groupby:仅在部分记录上聚合

时间:2018-04-05 19:40:51

标签: python-3.x pandas aggregate pandas-groupby

我有以下数据框:

id   src     target     duration
001     A      C           4
001     B      C           3
001     C      C           2
002     B      D           5
002     C      D           2

我使用以下代码进行一些聚合,效果很好。

df_new = df.groupby(['id','target']) \
        .apply(lambda x: pd.Series({'min_duration': min(x['duration']), \
                                    'total_duration':sum(x['duration']), \
                                    'all_src':list(x['src'])
                                   })).reset_index()

现在我只想计算src != target条记录的总和。我修改了我的代码,如下所示:

df_new = df.groupby(['id','target']) \
        .apply(lambda x: pd.Series({'min_duration': min(x['duration']), \
                                    'total_duration':sum(x['duration']), \
                                    'total_duration_condition':sum(x['duration']) if x['src'] != x['target'], \
                                    'all_src':list(x['src'])
                                   })).reset_index()

但是我的新行中出现Invalid Syntax错误:

'total_duration_condition':sum(x['duration']) if x['src'] != x['target']

我想知道应该只为部分记录做出总和的正确方法是什么?谢谢!

1 个答案:

答案 0 :(得分:2)

尝试编写如下代码

df.groupby(['id','target']).apply(lambda x: pd.Series({'min_duration': min(x['duration']), \
                                    'total_duration':sum(x['duration']), \
                                    'total_duration_condition':sum(x['duration'][x['src'] != x['target']]), \# I change this part
                                    'all_src':list(x['src'])
                                   })).reset_index()

更改行

'total_duration_condition':sum(x['duration']) if x['src'] != x['target']

sum(x['duration'][x['src'] != x['target']])