如何根据多个索引来排列数据帧?

时间:2016-12-14 10:11:50

标签: python pandas dataframe multi-index

我有一个这样的数据框:

Date    Shift   Machine_number  production
9/1/2016    C   1   0.795578112
9/1/2016    C   2   0.40730688              
9/1/2016    C   3   0.41150592
9/1/2016    C   4   0.40310784              
9/1/2016    C   5   0.376233984
9/2/2016    A   1   0.470486495             
9/2/2016    A   2   0.41360544
9/2/2016    A   3   0.41780448
9/2/2016    A   4   0.40520736              
9/2/2016    A   5   0.329204736
9/2/2016    B   1   0.472911683             
9/2/2016    B   2   0.4094064
9/2/2016    B   3   0.4094064               
9/2/2016    B   4   0.41570496
9/2/2016    B   5   0.366436224

我想创建一个多索引的数据框:

Date Machine No. Shift production
9/1/2016 1 c 0.795578112
9/2/2016 1 a 0.470486495
9/2/2016 1 b 0.472911683

感谢。

我试过了: idx0=np.array(df['Machine_number']) idx1 = np.array(df['Shift']) df2 = DataFrame(index = [idx0,idx1], columns = df["production"])

1 个答案:

答案 0 :(得分:1)

我认为你需要set_index

#by 2 columns
df = df.set_index(['Machine_number','Shift'])
print (df)
                          Date  production
Machine_number Shift                      
1              C      9/1/2016    0.795578
2              C      9/1/2016    0.407307
3              C      9/1/2016    0.411506
4              C      9/1/2016    0.403108
5              C      9/1/2016    0.376234
1              A      9/2/2016    0.470486
2              A      9/2/2016    0.413605
3              A      9/2/2016    0.417804
4              A      9/2/2016    0.405207
5              A      9/2/2016    0.329205
1              B      9/2/2016    0.472912
2              B      9/2/2016    0.409406
3              B      9/2/2016    0.409406
4              B      9/2/2016    0.415705
5              B      9/2/2016    0.366436
#by 2 columns and filter another columns by subset
df = df.set_index(['Machine_number','Shift'])[['production']]
print (df)
                      production
Machine_number Shift            
1              C        0.795578
2              C        0.407307
3              C        0.411506
4              C        0.403108
5              C        0.376234
1              A        0.470486
2              A        0.413605
3              A        0.417804
4              A        0.405207
5              A        0.329205
1              B        0.472912
2              B        0.409406
3              B        0.409406
4              B        0.415705
5              B        0.366436
#by 3 columns
df = df.set_index(['Date', 'Machine_number','Shift'])
print (df)
                               production
Date     Machine_number Shift            
9/1/2016 1              C        0.795578
         2              C        0.407307
         3              C        0.411506
         4              C        0.403108
         5              C        0.376234
9/2/2016 1              A        0.470486
         2              A        0.413605
         3              A        0.417804
         4              A        0.405207
         5              A        0.329205
         1              B        0.472912
         2              B        0.409406
         3              B        0.409406
         4              B        0.415705
         5              B        0.366436

第一个解决方案sort_values

df = df.sort_values(['Machine_number','Shift'], ascending=[True,False])
       .reset_index(drop=True)
#if need change order of columns
df = df[['Date','Machine_number','Shift','production']]
print (df)
        Date  Machine_number Shift  production
0   9/1/2016               1     C    0.795578
1   9/2/2016               1     B    0.472912
2   9/2/2016               1     A    0.470486
3   9/1/2016               2     C    0.407307
4   9/2/2016               2     B    0.409406
5   9/2/2016               2     A    0.413605
6   9/1/2016               3     C    0.411506
7   9/2/2016               3     B    0.409406
8   9/2/2016               3     A    0.417804
9   9/1/2016               4     C    0.403108
10  9/2/2016               4     B    0.415705
11  9/2/2016               4     A    0.405207
12  9/1/2016               5     C    0.376234
13  9/2/2016               5     B    0.366436
14  9/2/2016               5     A    0.329205

如果需要将订单更改为C, A, B使用ordered Categorical并在参数categories中设置顺序:

df.Shift = df.Shift.astype('category', ordered=True, categories=['C','A','B'])
df = df.sort_values(['Machine_number','Shift']).reset_index(drop=True)
print (df)
        Date Shift  Machine_number  production
0   9/1/2016     C               1    0.795578
1   9/2/2016     A               1    0.470486
2   9/2/2016     B               1    0.472912
3   9/1/2016     C               2    0.407307
4   9/2/2016     A               2    0.413605
5   9/2/2016     B               2    0.409406
6   9/1/2016     C               3    0.411506
7   9/2/2016     A               3    0.417804
8   9/2/2016     B               3    0.409406
9   9/1/2016     C               4    0.403108
10  9/2/2016     A               4    0.405207
11  9/2/2016     B               4    0.415705
12  9/1/2016     C               5    0.376234
13  9/2/2016     A               5    0.329205
14  9/2/2016     B               5    0.366436