我很难找到一种以两种方式重塑多索引(列)df的方法: (a)除一根色谱柱外,两个级别的多指标均已融化且
(b)每1级柱熔化1级柱
我可能已经发现(a),尽管可能不是最程序化的方式,但可以接近(b)却没有雪茄
例如,给定DataFrame:
df = pd.DataFrame({'dealer': {0: 'SF', 1: 'LA'},
'col2': {0: 1, 1: 3},
'col3': {0: 2, 1: 4},
'col4': {0: 3, 1: 6},
'col5': {0: 7, 1: 2},
})
df.columns = [['Jan','Jan','Feb','Feb','dealer'], ['cars','trucks','cars','trucks','dealer']]
Out[209]:
Jan Feb dealer
cars trucks cars trucks dealer
0 1 2 3 7 SF
1 3 4 6 2 LA
我可以通过以下方式到达(a):
melted = df.melt(id_vars = 'dealer',col_level=0, var_name='month')
melted['product']=df.melt(id_vars = 'dealer',col_level=1)['variable']
melted.sort_values('dealer', inplace=True)
melted
Out[211]:
dealer month value product
1 LA Jan 3 cars
3 LA Jan 4 trucks
5 LA Feb 6 cars
7 LA Feb 2 trucks
0 SF Jan 1 cars
2 SF Jan 2 trucks
4 SF Feb 3 cars
6 SF Feb 7 trucks
但似乎无法正确地获得['dealer','product','Jan','Feb']作为列标签,其值分别为Jan和Feb cols
pivotedd = pd.DataFrame({'dealer': {0: 'LA', 1: 'LA',2: 'SF', 3: 'SF'},
'product': {0: 'cars', 1: 'trucks',2: 'cars', 3: 'trucks'},
'Jan': {0: 3, 1: 4,2:1,3:2},
'Feb': {0: 6, 1: 2,2:3,3:7},
})
Out[215]:
Feb Jan dealer product
0 6 3 LA cars
1 2 4 LA trucks
2 3 1 SF cars
3 7 2 SF trucks
很显然,我希望按时间顺序将经销商和产品作为第一列和日期(还没有阅读为什么pd.dataframe更改了您输入数据的顺序),但这实际上是我所追求的。 / p>
谢谢!
答案 0 :(得分:1)
您可以尝试:
df.set_index('dealer').stack(1).reset_index().rename(columns={'level_1':'product'})
dealer product Feb Jan
0 (SF,) cars 3 1
1 (SF,) trucks 7 2
2 (LA,) cars 6 3
3 (LA,) trucks 2 4