具有多索引的Pandas子数据透视表和总数据透视表

时间:2020-05-13 19:30:31

标签: python pandas pivot-table multi-index

我正在尝试添加一个带有小计的新列,以及一个带有总计的最后一列。例如,

df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"],
               "B": ["one", "one", "one", "two", "two","one", "one", "two", "two"],
               "C": ["small", "large", "large", "small","small", "large", "small", "small", "large"],
               "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
               "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})

即:

     A    B      C  D  E
0  foo  one  small  1  2
1  foo  one  large  2  4
2  foo  one  large  2  5
3  foo  two  small  3  5
4  foo  two  small  3  6
5  bar  one  large  4  6
6  bar  one  small  5  8
7  bar  two  small  6  9
8  bar  two  large  7  9

现在,我枢纽化:

table = pd.pivot_table(df, values=['D',"E"], index=['A'],columns=['C'])

并添加总计:

table['total'] = table.sum(axis=1)
for t in ["D", "E"]:
   table[t, "partial_total"]  = table[t].sum(axis=1)

虽然这在数值上可行,但从视觉上来说很烦人。我想获取D(包括partial_total),然后依次为Etotal的所有数据。这是我生成的df:

        D               E                total             D             E
C   large     small large     small            partial_total partial_total
A                                                                         
bar   5.5  5.500000   7.5  8.500000  27.000000     11.000000     16.000000
foo   2.0  2.333333   4.5  4.333333  13.166667      4.333333      8.833333

如此

如何将相同(顶级)列的值分组在一起?

3 个答案:

答案 0 :(得分:1)

您可以使用margin进行旋转:

new_df = (df.pivot_table(index='A', columns='C', 
                         values=['D','E'], aggfunc='sum',
                         margins=True, margins_name='partial_total')
   .assign(total=lambda x: x.loc[:, (slice(None),'partial_total')].sum(1))
)

输出:

                D                               E                               total
C               large   small   partial_total   large   small   partial_total   
A                           
bar             11      11      22              15      17      32              54
foo             4       7       11              9       13      22              33
partial_total   15      18      33              24      30      54              87

答案 1 :(得分:1)

尝试在.sub(pattern, str)之前执行操作

pivot_table

g = df.groupby(['A', 'C'])[['D', 'E']]

d = (g.sum()/g.count()).reset_index()
m = d.groupby('A', as_index=False).sum().assign(C='partial')

final = pd.concat([m, d]).pivot_table(index='A', columns='C')

要专门回答您的最后一个问题

如何将相同(顶级)列的值分组在一起?

您可能只是 D E C large small partial large small partial A bar 5.5 5.500000 11.000000 7.5 8.500000 16.000000 foo 2.0 2.333333 4.333333 4.5 4.333333 8.833333

sort_index

table.sort_index(axis=1)

答案 2 :(得分:1)

尝试使用pd.concat

table = pd.pivot_table(df, values=['D',"E"], index=['A'],columns=['C'])
table.columns = [f'{i}_{j}' for i, j in table.columns]
pd.concat([table,
           table.sum(axis=1, level=0).add_suffix('_partial_total'),
           table.sum(axis=1).to_frame(name='total')], axis=1)

输出:

     D_large   D_small  E_large   E_small  D_large_partial_total  D_small_partial_total  E_large_partial_total  E_small_partial_total      total
A                                                                                                                                               
bar      5.5  5.500000      7.5  8.500000                    5.5               5.500000                    7.5               8.500000  27.000000
foo      2.0  2.333333      4.5  4.333333                    2.0               2.333333                    4.5               4.333333  13.166667