我使用pd.pivot_table创建了数据透视表,该表具有2个不同的列标题层。如何通过选择性地过滤列来汇总这些列?
数据透视表:
sum count
A | B | C | A | B | C
1 2 | 2 | 2 | 3 | 3 | 4
2 2 | 2 | 2 | 3 | 3 | 4
3 2 | 2 | 2 | 3 | 3 | 4
4 2 | 2 | 2 | 3 | 3 | 4
所需的输出:
sum count sum of sum sum of count
A | B | C | A | B | C
1 2 | 2 | 2 | 3 | 3 | 4 | 6 | 10
2 2 | 2 | 2 | 3 | 3 | 4 | 6 | 10
3 2 | 2 | 2 | 3 | 3 | 4 | 6 | 10
4 2 | 2 | 2 | 3 | 3 | 4 | 6 | 10
答案 0 :(得分:0)
尝试一下,这是一个粗略的答案,但我相信它将帮助您解决问题
import pandas as pd
import numpy as np
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
"bar", "bar", "bar", "bar"],
"B": ["one", "one", "one", "two", "two",
"one", "one", "two", "two"],
"C": ["small", "large", "large", "small",
"small", "large", "small", "small",
"large"],
"D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
"E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
table = pd.pivot_table(df, values='D', index=['A', 'B'],
columns=['C'], aggfunc=np.sum, fill_value=0)
table2 = table.T #This created table like you
#creating new table with sum of level 0
table3 = table2.groupby(level=0,axis = 1).sum().rename(columns = {'bar':'sum_bar','foo':'sum_foo'})
#concatenate table 2 and 3 to get final result,
result = pd.concat([table2, table3], axis=1, sort=False)
table2
Out[35]:
A bar foo
B one two one two
C
large 4 7 4 0
small 5 6 1 6
result
Out[36]:
(bar, one) (bar, two) (foo, one) (foo, two) sum_bar sum_foo
C
large 4 7 4 0 11 4
small 5 6 1 6 11 7