Question

使用数据框中的数据透视表在列上查找子总计。

df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"], "B": ["one", "one", "one", "two", "two", "one", "one", "two", "two"], "C": ["small", "large", "large", "small", "small", "large", "small", "small", "large"], "D": [1, 2, 2, 3, 3, 4, 5, 6, 7]})

print (df)

pd.pivot_table(df, values=['D'], index=['A'], columns=['C', 'B'], aggfunc={'D': np.sum}, margins=True, fill_value=0, margins_name="Total")


following should be the output:

    D                   
C    large    Total    small    Total
B    one  two          one  two 
A                       
bar    4    7    11      5    6    11
foo    4    0     4      1    6     7
Total  8    7    15      6   12    33

Answer 1

我认为最好为Total的第二级添加新的MultiIndex值，以便可能被第一级过滤。

要获得正确的列顺序，请使用Total创建有序的categorical。

df['B'] = pd.CategoricalIndex(df['B'], 
                              categories= df['B'].unique().tolist() + ['Total'], 
                              ordered=True)

对于从['D']到D的汇总更改，以防止出现3个级别MultiIndex：

df1 = pd.pivot_table(df, 
                     values='D', 
                     index=['A'], 
                     columns=['C', 'B'], 
                     aggfunc={'D': np.sum}, 
                     fill_value=0)
print (df1)
C   large     small    
B     one two   one two
A                      
bar     4   7     5   6
foo     4   0     1   6

然后使用小计sum和MultiIndex.from_product创建新的DataFrame：

df2 = df1.sum(level=0, axis=1)
df2.columns = pd.MultiIndex.from_product([df2.columns, ['Total']])
print (df2)

    large small
    Total Total
A              
bar    11    11
foo     4     7

然后DataFrame.join在一起，DataFrame.sort_index将Total正确地添加到最后位置，最后添加sum行：

df = df1.join(df2).sort_index(axis=1)
df.loc['Total'] = df.sum()
print (df)
C     large           small          
B       one two Total   one two Total
A                                    
bar       4   7    11     5   6    11
foo       4   0     4     1   6     7
Total     8   7    15     6  12    18

使用数据透视表在列上汇总熊猫数据框

1 个答案: