枢纽分析表后跨栏汇总

时间:2019-12-11 01:55:09

标签: python-3.x sum pivot-table multi-index

我使用pd.pivot_table创建了数据透视表,该表具有2个不同的列标题层。如何通过选择性地过滤列来汇总这些列?

数据透视表:


      sum       count

   A | B | C | A | B | C

1  2 | 2 | 2 | 3 | 3 | 4

2  2 | 2 | 2 | 3 | 3 | 4

3  2 | 2 | 2 | 3 | 3 | 4

4  2 | 2 | 2 | 3 | 3 | 4

所需的输出:

      sum       count    sum of sum  sum of count

   A | B | C | A | B | C

1  2 | 2 | 2 | 3 | 3 | 4  |   6    |  10

2  2 | 2 | 2 | 3 | 3 | 4  |   6    |  10

3  2 | 2 | 2 | 3 | 3 | 4  |   6    |  10

4  2 | 2 | 2 | 3 | 3 | 4  |   6    |  10

1 个答案:

答案 0 :(得分:0)

尝试一下,这是一个粗略的答案,但我相信它将帮助您解决问题

import pandas as pd
import numpy as np

df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                         "bar", "bar", "bar", "bar"],
                  "B": ["one", "one", "one", "two", "two",
                         "one", "one", "two", "two"],
                  "C": ["small", "large", "large", "small",
                         "small", "large", "small", "small",
                          "large"],
                  "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
                  "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})


table = pd.pivot_table(df, values='D', index=['A', 'B'],
                    columns=['C'], aggfunc=np.sum, fill_value=0)

table2 =  table.T      #This created table like you

#creating new table with sum of level 0
table3 = table2.groupby(level=0,axis = 1).sum().rename(columns = {'bar':'sum_bar','foo':'sum_foo'})

#concatenate table 2 and 3 to get final result,
result = pd.concat([table2, table3], axis=1, sort=False)

输出:

table2
Out[35]: 
A     bar     foo    
B     one two one two
C                    
large   4   7   4   0
small   5   6   1   6

result
Out[36]: 
       (bar, one)  (bar, two)  (foo, one)  (foo, two)  sum_bar  sum_foo
C                                                                      
large           4           7           4           0       11        4
small           5           6           1           6       11        7