我有以下数据框:
category,name,stage,label,score,weight
a,p1,s1,l1,123,27
a,p1,s1,l1,124,42
a,p1,s1,l2,125,43
a,p1,s2,l1,126,36
a,p1,s2,l2,127,4
a,p1,s2,l2,128,62
a,p1,s2,l2,129,29
a,p2,s1,l1,134,100
a,p2,s1,l1,135,59
a,p2,s2,l1,136,11
b,p1,s1,l1,139,27
b,p1,s1,l1,140,42
b,p1,s1,l2,141,43
b,p1,s2,l2,142,36
b,p1,s2,l2,143,4
b,p1,s2,l2,144,62
b,p1,s2,l2,145,29
我有类别,名称作为我的索引,舞台,标签作为我的列。在我的表中,我想要列和行的小计。
我使用以下代码:
col_names = ['stage', 'label']
row_names = ['category', 'name']
value_names = ['score']
aggregates = {
'score': ['sum']
}
value_count = 1
cols = row_names + col_names
gb = pd.concat(
[dataframe.assign(**{x: 'zzzz' for x in cols[i:]})
.groupby(cols)
.aggregate(aggregates) for
i in range(1, len(cols))
]
).sort_index().unstack(col_names)
给了我
score
sum
stage s1 s2
label l1 l2 zzzz l1 l2 zzzz
category name
a p1 247.0 125.0 NaN 126.0 384.0 NaN
p2 269.0 NaN NaN 136.0 NaN NaN
zzzz 516.0 125.0 NaN 262.0 384.0 NaN
b p1 279.0 141.0 NaN NaN 574.0 NaN
zzzz 279.0 141.0 NaN NaN 574.0 NaN
zzzz zzzz 795.0 266.0 1061.0 262.0 958.0 1220.0
zzzz标记小计行/列。正如您所看到的,我正在获取行的小计,但不是列。如果我改变代码,那么
cols = col_names + row_names
我得到列小计但不是行小计。
我怀疑我的方法不是获得行和列小计的正确方法。
建议?
感谢。
答案 0 :(得分:0)
看来,我们需要做一个解决方法来计算小计。 :)
import pandas as pd
from io import StringIO
csv=StringIO("""category,name,stage,label,score,weight
a,p1,s1,l1,123,27
a,p1,s1,l1,124,42
a,p1,s1,l2,125,43
a,p1,s2,l1,126,36
a,p1,s2,l2,127,4
a,p1,s2,l2,128,62
a,p1,s2,l2,129,29
a,p2,s1,l1,134,100
a,p2,s1,l1,135,59
a,p2,s2,l1,136,11
b,p1,s1,l1,139,27
b,p1,s1,l1,140,42
b,p1,s1,l2,141,43
b,p1,s2,l2,142,36
b,p1,s2,l2,143,4
b,p1,s2,l2,144,62
b,p1,s2,l2,145,29
""")
df=pd.read_csv(csv)
print(df)
#calculated pivot table. This is almost what we neeed.
table = pd.pivot_table(df, index=['category', 'name'], columns=['stage', 'label'], values=['score', 'weight'],
aggfunc=np.sum, margins=True)
table.drop(index='All', level=0, inplace=True)
print("\ntable=\n", table)
#calculated subtotal
df_subtotal=table.groupby(['category']).sum()
#align df_subtotal index with table's index
df_subtotal['name']='total'
df_subtotal.set_index(['name'], append=True, inplace=True)
#add subtotals to table dataframe
df_result=pd.concat([dt, df_sum], axis=0).sort_index()
print('\ndf_result=\n', df_result)
category name stage label score weight
0 a p1 s1 l1 123 27
1 a p1 s1 l1 124 42
2 a p1 s1 l2 125 43
3 a p1 s2 l1 126 36
4 a p1 s2 l2 127 4
5 a p1 s2 l2 128 62
6 a p1 s2 l2 129 29
7 a p2 s1 l1 134 100
8 a p2 s1 l1 135 59
9 a p2 s2 l1 136 11
10 b p1 s1 l1 139 27
11 b p1 s1 l1 140 42
12 b p1 s1 l2 141 43
13 b p1 s2 l2 142 36
14 b p1 s2 l2 143 4
15 b p1 s2 l2 144 62
16 b p1 s2 l2 145 29
table=
score weight
stage s1 s2 All s1 s2 All
label l1 l2 l1 l2 l1 l2 l1 l2
category name
a p1 247.0 125.0 126.0 384.0 882 69.0 43.0 36.0 95.0 243
p2 269.0 NaN 136.0 NaN 405 159.0 NaN 11.0 NaN 170
b p1 279.0 141.0 NaN 574.0 994 69.0 43.0 NaN 131.0 243
df_result=
score weight \
stage s1 s2 All s1 s2
label l1 l2 l1 l2 l1 l2 l1 l2
category name
a p1 247.0 125.0 126.0 384.0 882 69.0 43.0 36.0 95.0
p2 269.0 NaN 136.0 NaN 405 159.0 NaN 11.0 NaN
total 516.0 125.0 262.0 384.0 1287 228.0 43.0 47.0 95.0
b p1 279.0 141.0 NaN 574.0 994 69.0 43.0 NaN 131.0
total 279.0 141.0 NaN 574.0 994 69.0 43.0 NaN 131.0
stage All
label
category name
a p1 243
p2 170
total 413
b p1 243
total 243