如何使用Python(Pandas)更改多索引的顺序

时间:2018-08-23 21:51:29

标签: python pandas multidimensional-array slice

Data is shown as this Image

我想知道如何使用Python将“进度”重新排序为“开始-开发-中间-操作”?

Python自动按字母顺序排序,但我不希望它那样做。

有人可以帮助我吗?

1 个答案:

答案 0 :(得分:1)

让我们尝试一下:

df = pd.DataFrame({'Country':['France']*4+['China']*4,'Progress':['Develop','Middle','Operate','Start']*2,'NumTrans':np.random.randint(100,900,8),'TransValue':np.random.randint(10000,9999999,8)})

df = df.set_index(['Country','Progress']).T
print(df)

源数据框:

Country      France                              China                           
Progress    Develop   Middle  Operate    Start Develop   Middle  Operate    Start
NumTrans        603      661      251      110     747      780      390      346
TransValue  8662422  5226407  4679673  2589011  695373  5655969  2079905  7878596

设置为类别并定义顺序:

df.columns = df.columns.set_levels([df.columns.levels[0],
                      df.columns.levels[1].astype('category').reorder_categories(['Start','Develop','Middle','Operate'])])

#Sort the dataframe using category dtype
df = df.sort_index(axis=1)
print(df)

输出:

Country       China                             France                           
Progress      Start Develop   Middle  Operate    Start  Develop   Middle  Operate
NumTrans        346     747      780      390      110      603      661      251
TransValue  7878596  695373  5655969  2079905  2589011  8662422  5226407  4679673

更新测试缺少国家/地区的进度步骤。

df = pd.DataFrame({'Country':['France']*4+['China']*4,'Progress':['Develop','Middle','Operate','Start']*2,'NumTrans':np.random.randint(100,900,8),'TransValue':np.random.randint(10000,9999999,8)})

df = df.set_index(['Country','Progress']).T
df2 = df.drop(('China','Operate'), axis=1)
df2

输入数据框

Country      France                               China                  
Progress    Develop   Middle  Operate    Start  Develop   Middle    Start
NumTrans        672      496      319      394      346      402      462
TransValue  6341768  5832091  9580341  5739947  6399118  6826113  1501382

分类和排序:

df2.columns = df2.columns.set_levels([df.columns.levels[0],
                      df2.columns.levels[1].astype('category').reorder_categories(['Start','Develop','Middle','Operate'])])

#Sort the dataframe using category dtype
df2 = df2.sort_index(axis=1)
print(df2)

输出:

Country       China                     France                           
Progress      Start  Develop   Middle    Start  Develop   Middle  Operate
NumTrans        359      496      191      886      685      814      581
TransValue  1369593  8810118  5527613  8970396  1424341  8017561  7749721