我有以下数据框:
id src target duration
001 A C 4
001 B C 3
001 C C 2
002 B D 5
002 C D 2
我使用以下代码进行一些聚合,效果很好。
df_new = df.groupby(['id','target']) \
.apply(lambda x: pd.Series({'min_duration': min(x['duration']), \
'total_duration':sum(x['duration']), \
'all_src':list(x['src'])
})).reset_index()
现在我只想计算src != target
条记录的总和。我修改了我的代码,如下所示:
df_new = df.groupby(['id','target']) \
.apply(lambda x: pd.Series({'min_duration': min(x['duration']), \
'total_duration':sum(x['duration']), \
'total_duration_condition':sum(x['duration']) if x['src'] != x['target'], \
'all_src':list(x['src'])
})).reset_index()
但是我的新行中出现Invalid Syntax
错误:
'total_duration_condition':sum(x['duration']) if x['src'] != x['target']
我想知道应该只为部分记录做出总和的正确方法是什么?谢谢!
答案 0 :(得分:2)
尝试编写如下代码
df.groupby(['id','target']).apply(lambda x: pd.Series({'min_duration': min(x['duration']), \
'total_duration':sum(x['duration']), \
'total_duration_condition':sum(x['duration'][x['src'] != x['target']]), \# I change this part
'all_src':list(x['src'])
})).reset_index()
更改行
'total_duration_condition':sum(x['duration']) if x['src'] != x['target']
到
sum(x['duration'][x['src'] != x['target']])