假设我有以下数据作为pandas数据帧:
scons
2015-01-30的报道重复两次。总结那一行的最简单方法是什么,以便我在2015-01-30只有一个等于3.10的条目?
到目前为止,我已尝试过以下内容:
type exdiv paydate amount
declared
2014-01-31 final 2014-03-03 2014-03-10 3.10
2014-06-27 interim 2014-08-11 2014-08-18 1.55
2015-01-30 final 2015-03-02 2015-03-09 2.33
2015-01-30 final 2015-03-02 2015-03-09 0.77
2015-06-26 interim 2015-08-07 2015-08-17 1.80
2016-01-29 final 2016-02-29 2016-03-07 3.45
但这会创建一个多索引,我无法使用当前索引列('声明')。
我知道我可以将索引添加为普通列,运行命令并尝试将多索引转换回单个索引,但我确定pandas中必须有更好的方法吗?
答案 0 :(得分:2)
df['amount'] = df.groupby(level=0)['amount'].transform(sum)
df = df.reset_index().drop_duplicates(subset=['declared','type','exdiv','paydate'])
print (df)
declared type exdiv paydate amount
0 2014-01-31 final 2014-03-03 2014-03-10 3.10
1 2014-06-27 interim 2014-08-11 2014-08-18 1.55
2 2015-01-30 final 2015-03-02 2015-03-09 3.10
4 2015-06-26 interim 2015-08-07 2015-08-17 1.80
5 2016-01-29 final 2016-02-29 2016-03-07 3.45
或者将reset_index
和aggfunc=sum
添加到pivot_table
:
x=pd.pivot_table(df.reset_index(),
values='amount',
index=['declared','exdiv','paydate','type'],
aggfunc=sum).reset_index()
print (x)
declared exdiv paydate type amount
0 2014-01-31 2014-03-03 2014-03-10 final 3.10
1 2014-06-27 2014-08-11 2014-08-18 interim 1.55
2 2015-01-30 2015-03-02 2015-03-09 final 3.10
3 2015-06-26 2015-08-07 2015-08-17 interim 1.80
4 2016-01-29 2016-02-29 2016-03-07 final 3.45