我有以下格式的数据框df
:
df =
MONTH WEEKDAY EVAL
1 0 1
1 0 0
1 0 0
1 1 1
1 1 0
2 0 0
2 0 0
2 1 1
我按如下方式对数据进行分组:
result = df.groupby(['MONTH','WEEKDAY','EVAL']).size().reset_index()
result
输出结果的方式与我想要的结果不同:
MONTH WEEKDAY EVAL 0
1 0 0 400
1 0 1 20
1 1 0 300
1 1 1 20
2 0 0 200
2 0 1 35
2 1 0 450
2 1 1 26
我想将result
的格式更改为此格式:
WEEKDAY EVAL_0 EVAL_1
0 400 20
0 200 35
1 300 20
1 450 26
我该怎么做?
答案 0 :(得分:1)
我认为您需要通过unstack
重新塑造,然后需要进行一些数据清理:
Second
带有重复项的示例:
df = df.set_index(['MONTH','WEEKDAY','EVAL'])['0'].unstack()
#if get ValueError: Index contains duplicate entries, cannot reshape
#if duplicates and necessary aggregate data with mean, sum...
#df = df.groupby(['MONTH','WEEKDAY','EVAL'])['0'].mean().unstack()
#df = df.pivot_table(index=['MONTH','WEEKDAY'], columns='EVAL', values='0', aggfunc='mean')
print (df)
EVAL 0 1
MONTH WEEKDAY
1 0 400 20
1 300 20
2 0 200 35
1 450 26
df = df.sort_index(level=[1,0])
.reset_index(level=0, drop=True)
.add_prefix('EVAL_')
.reset_index()
.rename_axis(None, axis=1)
print (df)
WEEKDAY EVAL_0 EVAL_1
0 0 400 20
1 0 200 35
2 1 300 20
3 1 450 26