多索引融化_透视

时间:2018-09-19 23:02:55

标签: python pandas

我很难找到一种以两种方式重塑多索引(列)df的方法:  (a)除一根色谱柱外,两个级别的多指标均已融化且

(b)每1级柱熔化1级柱

我可能已经发现(a),尽管可能不是最程序化的方式,但可以接近(b)却没有雪茄

例如,给定DataFrame:

df = pd.DataFrame({'dealer': {0: 'SF', 1: 'LA'},
                   'col2': {0: 1, 1: 3},
                   'col3': {0: 2, 1: 4},
                   'col4': {0: 3, 1: 6},
                   'col5': {0: 7, 1: 2},
                  })
df.columns = [['Jan','Jan','Feb','Feb','dealer'], ['cars','trucks','cars','trucks','dealer']]

Out[209]: 
   Jan         Feb        dealer
  cars trucks cars trucks dealer
0    1      2    3      7     SF
1    3      4    6      2     LA

我可以通过以下方式到达(a):

melted = df.melt(id_vars = 'dealer',col_level=0, var_name='month')
melted['product']=df.melt(id_vars = 'dealer',col_level=1)['variable']
melted.sort_values('dealer', inplace=True)

melted
Out[211]: 
  dealer month  value product
1     LA   Jan      3    cars
3     LA   Jan      4  trucks
5     LA   Feb      6    cars
7     LA   Feb      2  trucks
0     SF   Jan      1    cars
2     SF   Jan      2  trucks
4     SF   Feb      3    cars
6     SF   Feb      7  trucks

但似乎无法正确地获得['dealer','product','Jan','Feb']作为列标签,其值分别为Jan和Feb cols

pivotedd =  pd.DataFrame({'dealer': {0: 'LA', 1: 'LA',2: 'SF', 3: 'SF'},
                   'product': {0: 'cars', 1: 'trucks',2: 'cars', 3: 'trucks'},
                   'Jan': {0: 3, 1: 4,2:1,3:2},
                   'Feb': {0: 6, 1: 2,2:3,3:7},
                  })

Out[215]: 
   Feb  Jan dealer product
0    6    3     LA    cars
1    2    4     LA  trucks
2    3    1     SF    cars
3    7    2     SF  trucks

很显然,我希望按时间顺序将经销商和产品作为第一列和日期(还没有阅读为什么pd.dataframe更改了您输入数据的顺序),但这实际上是我所追求的。 / p>

谢谢!

1 个答案:

答案 0 :(得分:1)

您可以尝试:

df.set_index('dealer').stack(1).reset_index().rename(columns={'level_1':'product'})

  dealer  product  Feb  Jan
0  (SF,)    cars    3    1
1  (SF,)  trucks    7    2
2  (LA,)    cars    6    3
3  (LA,)  trucks    2    4