我有一个数据框,想将日期汇总为3列,并在末尾添加一些计算出的列。
数据框列:
cols = ["region_2",
"trade_flag",
"trade_target",
"broker",
"trade_shares",
"total_value",
"commission_in_gbp",
"IS/Order Start PTA - Realized Cost/Sh",
"IS/Order Start PTA - Realized Net Cost/Sh",
"IS/Order Start PTA - Base Bench Price",
"IS/Order Start PTA - P/L"]
示例输入:
region_2 trade_flag trade_target broker trade_shares total_value commission_in_gbp IS/Order Start PTA - Realized Cost/Sh IS/Order Start PTA - Realized Net Cost/Sh IS/Order Start PTA - Base Bench Price IS/Order Start PTA - P/L count
0 EMEA flag1 target1 broker1 3900 39532 0.00406 -0.067 -0.067 10.2037 -261.91 1
1 APAC flag2 target2 broker2 1700 17232 0.00406 -0.067 -0.067 10.2037 -114.17 1
2 AMER flag1 target1 broker3 1400 14191 0.00406 -0.067 -0.067 10.2037 -94.02 1
3 EMEA flag2 target2 broker2 2000 20273 0.00406 -0.067 -0.067 10.2037 -134.31 1
所需的输出:
region_2 | trade_flag | broker | count | total_value | perf | net perf
最后的perf列是加权平均值计算。
我跟随另一个不起作用的示例的代码(KeyError)
df['count'] = 1
df['perf'] = ""
df['net perf'] = ""
wm = lambda x: x['IS/Order Start PTA - Realized Cost/Sh'] * x['trade_shares'] * 10000 / x['IS/Order Start PTA - Base Bench Price'] * x['trade_shares']
wm2 = lambda x: x['IS/Order Start PTA - Realized Net Cost/Sh'] * x['trade_shares'] * 10000 / x['IS/Order Start PTA - Base Bench Price'] * x['trade_shares']
f = {'trade_shares': ['sum'],
'total_value': ['sum'],
'count': ['sum'],
'perf': {'weighted mean' : wm},
'net perf': {'weighted mean' : wm2}}
df = df.groupby(['region_2', 'trade_flag', 'broker']).agg(f)
df = df[['region_2', 'trade_flag', 'broker', 'count', 'total_value', 'actual', 'net']]
答案 0 :(得分:0)
您可以使用pivot_table代替groupby
pivot = pd.pivot_table(
df,
index=[
'region_2',
'trade_flag',
'broker',
],
values=[
'trade_shares',
'total_value',
'count',
'perf',
'net perf'
],
aggfunc={
'trade_shares': np.sum,
'total_value': np.sum,
'count': np.sum,
'perf': wm,
'net perf': wm2
}
)
尽管这将有助于查看实际的错误消息和示例输入以查看是否是实际的问题。
答案 1 :(得分:0)
您需要GroupBy.apply
,因为GroupBy.agg
是分别与每一列一起工作的,所以KeyError
:
def f(x):
a = x['trade_shares'].sum()
b = x['total_value'].sum()
c = len(x)
#x['perf'] = x['IS/Order Start PTA - Realized Cost/Sh'] * x['trade_shares'] * 10000 / x['IS/Order Start PTA - Base Bench Price'] * x['trade_shares']
#x['net perf'] = x['IS/Order Start PTA - Realized Net Cost/Sh'] * x['trade_shares'] * 10000 / x['IS/Order Start PTA - Base Bench Price'] * x['trade_shares']
return pd.Series([a,b,c], index=['trade_shares','total_value','count'])
df = df.groupby(['region_2', 'trade_flag', 'broker']).apply(f).reset_index()