pandas - 行和列小计

时间:2018-02-05 06:25:36

标签: python pandas pandas-groupby

我有以下数据框:

category,name,stage,label,score,weight
a,p1,s1,l1,123,27
a,p1,s1,l1,124,42
a,p1,s1,l2,125,43
a,p1,s2,l1,126,36
a,p1,s2,l2,127,4
a,p1,s2,l2,128,62
a,p1,s2,l2,129,29
a,p2,s1,l1,134,100
a,p2,s1,l1,135,59
a,p2,s2,l1,136,11
b,p1,s1,l1,139,27
b,p1,s1,l1,140,42
b,p1,s1,l2,141,43
b,p1,s2,l2,142,36
b,p1,s2,l2,143,4
b,p1,s2,l2,144,62
b,p1,s2,l2,145,29

我有类别,名称作为我的索引,舞台,标签作为我的列。在我的表中,我想要列和行的小计。

我使用以下代码:

col_names = ['stage', 'label']
row_names = ['category', 'name']
value_names = ['score']
aggregates = {
    'score': ['sum']
}
value_count = 1

cols = row_names + col_names
gb = pd.concat(
    [dataframe.assign(**{x: 'zzzz' for x in cols[i:]})
        .groupby(cols)
        .aggregate(aggregates) for
            i in range(1, len(cols))
    ]
).sort_index().unstack(col_names)

给了我

               score                                     
                 sum                                     
stage             s1                    s2               
label             l1     l2    zzzz     l1     l2    zzzz
category name                                            
a        p1    247.0  125.0     NaN  126.0  384.0     NaN
         p2    269.0    NaN     NaN  136.0    NaN     NaN
         zzzz  516.0  125.0     NaN  262.0  384.0     NaN
b        p1    279.0  141.0     NaN    NaN  574.0     NaN
         zzzz  279.0  141.0     NaN    NaN  574.0     NaN
zzzz     zzzz  795.0  266.0  1061.0  262.0  958.0  1220.0

zzzz标记小计行/列。正如您所看到的,我正在获取行的小计,但不是列。如果我改变代码,那么

cols = col_names + row_names

我得到列小计但不是行小计。

我怀疑我的方法不是获得行和列小计的正确方法。

建议?

感谢。

1 个答案:

答案 0 :(得分:0)

看来,我们需要做一个解决方法来计算小计。 :)

import pandas as pd
from io import StringIO

csv=StringIO("""category,name,stage,label,score,weight
a,p1,s1,l1,123,27
a,p1,s1,l1,124,42
a,p1,s1,l2,125,43
a,p1,s2,l1,126,36
a,p1,s2,l2,127,4
a,p1,s2,l2,128,62
a,p1,s2,l2,129,29
a,p2,s1,l1,134,100
a,p2,s1,l1,135,59
a,p2,s2,l1,136,11
b,p1,s1,l1,139,27
b,p1,s1,l1,140,42
b,p1,s1,l2,141,43
b,p1,s2,l2,142,36
b,p1,s2,l2,143,4
b,p1,s2,l2,144,62
b,p1,s2,l2,145,29
""")

df=pd.read_csv(csv)
print(df)

#calculated pivot table. This is almost what we neeed.
table = pd.pivot_table(df, index=['category', 'name'], columns=['stage', 'label'], values=['score', 'weight'],
                      aggfunc=np.sum, margins=True)
table.drop(index='All', level=0, inplace=True)
print("\ntable=\n", table)

#calculated subtotal
df_subtotal=table.groupby(['category']).sum()

#align df_subtotal index with table's index
df_subtotal['name']='total'
df_subtotal.set_index(['name'], append=True, inplace=True)

#add subtotals to table dataframe
df_result=pd.concat([dt, df_sum], axis=0).sort_index()
print('\ndf_result=\n', df_result)

   category name stage label  score  weight
0         a   p1    s1    l1    123      27
1         a   p1    s1    l1    124      42
2         a   p1    s1    l2    125      43
3         a   p1    s2    l1    126      36
4         a   p1    s2    l2    127       4
5         a   p1    s2    l2    128      62
6         a   p1    s2    l2    129      29
7         a   p2    s1    l1    134     100
8         a   p2    s1    l1    135      59
9         a   p2    s2    l1    136      11
10        b   p1    s1    l1    139      27
11        b   p1    s1    l1    140      42
12        b   p1    s1    l2    141      43
13        b   p1    s2    l2    142      36
14        b   p1    s2    l2    143       4
15        b   p1    s2    l2    144      62
16        b   p1    s2    l2    145      29

table=
                score                           weight                        
stage             s1            s2         All     s1          s2         All
label             l1     l2     l1     l2          l1    l2    l1     l2     
category name                                                                
a        p1    247.0  125.0  126.0  384.0  882   69.0  43.0  36.0   95.0  243
         p2    269.0    NaN  136.0    NaN  405  159.0   NaN  11.0    NaN  170
b        p1    279.0  141.0    NaN  574.0  994   69.0  43.0   NaN  131.0  243

df_result=
                 score                            weight                     \
stage              s1            s2          All     s1          s2          
label              l1     l2     l1     l2           l1    l2    l1     l2   
category name                                                                
a        p1     247.0  125.0  126.0  384.0   882   69.0  43.0  36.0   95.0   
         p2     269.0    NaN  136.0    NaN   405  159.0   NaN  11.0    NaN   
         total  516.0  125.0  262.0  384.0  1287  228.0  43.0  47.0   95.0   
b        p1     279.0  141.0    NaN  574.0   994   69.0  43.0   NaN  131.0   
         total  279.0  141.0    NaN  574.0   994   69.0  43.0   NaN  131.0   


stage           All  
label                
category name        
a        p1     243  
         p2     170  
         total  413  
b        p1     243  
         total  243