以下是数据框:
A B val val2 loc
1 march 3 2 NY
1 april 5 1 NY
1 may 12 4 NY
2 march 4 1 NJ
2 april 7 5 NJ
2 may 12 1 NJ
3 march 1 8 CA
3 april 54 6 CA
3 may 2 9 CA
我想将其转换为:
march march april april may may
val1 val2 val1 val2 val1 val2
A B
1 NY 3 5 12 2 1 4
2 NJ 4 7 12 1 5 5
3 CA 1 54 2 8 6 9
我正在研究数据透视表以及堆栈和堆栈,但我确实陷入困境。我不确定从哪里开始
答案 0 :(得分:0)
使用pd.pivot_table
和一些级别交换:
new_df = (pd.pivot_table(df,['val','val2'],['A','loc'],['B'])
.sort_index(axis=1, level=1)
.swaplevel(0, axis=1))
>>> new_df
B april march may
val val2 val val2 val val2
A loc
1 NY 5 1 3 2 12 4
2 NJ 7 5 4 1 12 1
3 CA 54 6 1 8 2 9
如果列的顺序很重要(例如,您需要将它们分别设为march
,april
和may
),则可以将其设置为有序分类:
new_df = (pd.pivot_table(df,['val','val2'],['A','loc'],
[pd.Categorical(df.B, categories=['march','april','may'],
ordered=True)])
.dropna(how='all')
.sort_index(axis=1, level=1)
.swaplevel(0, axis=1))
>>> new_df
B march april may
val val2 val val2 val val2
A loc
1 NY 3.0 2.0 5.0 1.0 12.0 4.0
2 NJ 4.0 1.0 7.0 5.0 12.0 1.0
3 CA 1.0 8.0 54.0 6.0 2.0 9.0