使用数据框中的数据透视表在列上查找子总计。
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"], "B": ["one", "one", "one", "two", "two", "one", "one", "two", "two"], "C": ["small", "large", "large", "small", "small", "large", "small", "small", "large"], "D": [1, 2, 2, 3, 3, 4, 5, 6, 7]})
print (df)
pd.pivot_table(df, values=['D'], index=['A'], columns=['C', 'B'], aggfunc={'D': np.sum}, margins=True, fill_value=0, margins_name="Total")
following should be the output:
D
C large Total small Total
B one two one two
A
bar 4 7 11 5 6 11
foo 4 0 4 1 6 7
Total 8 7 15 6 12 33
答案 0 :(得分:0)
我认为最好为Total
的第二级添加新的MultiIndex
值,以便可能被第一级过滤。
要获得正确的列顺序,请使用Total
创建有序的categorical
。
df['B'] = pd.CategoricalIndex(df['B'],
categories= df['B'].unique().tolist() + ['Total'],
ordered=True)
对于从['D']
到D
的汇总更改,以防止出现3个级别MultiIndex
:
df1 = pd.pivot_table(df,
values='D',
index=['A'],
columns=['C', 'B'],
aggfunc={'D': np.sum},
fill_value=0)
print (df1)
C large small
B one two one two
A
bar 4 7 5 6
foo 4 0 1 6
然后使用小计sum
和MultiIndex.from_product
创建新的DataFrame:
df2 = df1.sum(level=0, axis=1)
df2.columns = pd.MultiIndex.from_product([df2.columns, ['Total']])
print (df2)
large small
Total Total
A
bar 11 11
foo 4 7
然后DataFrame.join
在一起,DataFrame.sort_index
将Total
正确地添加到最后位置,最后添加sum
行:
df = df1.join(df2).sort_index(axis=1)
df.loc['Total'] = df.sum()
print (df)
C large small
B one two Total one two Total
A
bar 4 7 11 5 6 11
foo 4 0 4 1 6 7
Total 8 7 15 6 12 18