我有以下代码段,试图在数据透视表上进行汇总聚合,并将生成的汇总重新组合到数据透视表数据帧中。但是我在连接不同级别的表时遇到了问题。
import pandas as pd
data = [
["alice", "school 1", "math", 95],
["alice", "school 1", "science", 87],
["charlie", "school 1", "math", 72],
["charlie", "school 1", "science", 63],
["bob", "school 2", "math", 92],
["bob", "school 2", "science", 68],
["dale", "school 2", "math", 56],
["dale", "school 2", "science", 78],
]
df = pd.DataFrame(data, columns =["student_name", "school", "class", "class score"])
pvt = pd.pivot_table(df, index=["class"], columns=["school", "student_name"])
print(pvt)
print()
aggregate_sum = pvt.groupby(level=1, axis=1).sum()
print(aggregate_sum)
数据透视表输出:
class score
school school 1 school 2
student_name alice charlie bob dale
class
math 95 72 92 56
science 87 63 68 78
总输出:
school school 1 school 2
class
math 167 148
science 150 146
如何将汇总输出连接到与学生姓名相同级别的数据透视表中?
预期输出:
class score
school school 1 school 2
student_name alice charlie sum bob dale sum
class
math 95 72 167 92 56 148
science 87 63 150 68 78 176
答案 0 :(得分:1)
与merge
合并并更新多列名称,然后与pd.MultiIndex.from_tuples()
创建多列以更新合并的多列。
final = pvt.merge(aggregate_sum, on='class', how='inner')
final = final.rename(columns={'school 1':('class score','school 1','sum'), 'school 2':('class score','school 2','sum')})
cols = final.columns
index = pd.MultiIndex.from_tuples(cols)
final.columns = index
final = (final[[('class score','school 1','alice'),('class score', 'school 1', 'charlie'),
('class score','school 1','sum'),('class score', 'school 2','bob'),
('class score', 'school 2','dale'),('class score', 'school 2','sum')]])
final
class score
school 1 school 2
alice charlie sum bob dale sum
class
math 95 72 167 92 56 148
science 87 63 150 68 78 146