我有一个这样的数据框:
Date Shift Machine_number production
9/1/2016 C 1 0.795578112
9/1/2016 C 2 0.40730688
9/1/2016 C 3 0.41150592
9/1/2016 C 4 0.40310784
9/1/2016 C 5 0.376233984
9/2/2016 A 1 0.470486495
9/2/2016 A 2 0.41360544
9/2/2016 A 3 0.41780448
9/2/2016 A 4 0.40520736
9/2/2016 A 5 0.329204736
9/2/2016 B 1 0.472911683
9/2/2016 B 2 0.4094064
9/2/2016 B 3 0.4094064
9/2/2016 B 4 0.41570496
9/2/2016 B 5 0.366436224
我想创建一个多索引的数据框:
Date Machine No. Shift production
9/1/2016 1 c 0.795578112
9/2/2016 1 a 0.470486495
9/2/2016 1 b 0.472911683
感谢。
我试过了:
idx0=np.array(df['Machine_number'])
idx1 = np.array(df['Shift'])
df2 = DataFrame(index = [idx0,idx1], columns = df["production"])
答案 0 :(得分:1)
我认为你需要set_index
:
#by 2 columns
df = df.set_index(['Machine_number','Shift'])
print (df)
Date production
Machine_number Shift
1 C 9/1/2016 0.795578
2 C 9/1/2016 0.407307
3 C 9/1/2016 0.411506
4 C 9/1/2016 0.403108
5 C 9/1/2016 0.376234
1 A 9/2/2016 0.470486
2 A 9/2/2016 0.413605
3 A 9/2/2016 0.417804
4 A 9/2/2016 0.405207
5 A 9/2/2016 0.329205
1 B 9/2/2016 0.472912
2 B 9/2/2016 0.409406
3 B 9/2/2016 0.409406
4 B 9/2/2016 0.415705
5 B 9/2/2016 0.366436
#by 2 columns and filter another columns by subset
df = df.set_index(['Machine_number','Shift'])[['production']]
print (df)
production
Machine_number Shift
1 C 0.795578
2 C 0.407307
3 C 0.411506
4 C 0.403108
5 C 0.376234
1 A 0.470486
2 A 0.413605
3 A 0.417804
4 A 0.405207
5 A 0.329205
1 B 0.472912
2 B 0.409406
3 B 0.409406
4 B 0.415705
5 B 0.366436
#by 3 columns
df = df.set_index(['Date', 'Machine_number','Shift'])
print (df)
production
Date Machine_number Shift
9/1/2016 1 C 0.795578
2 C 0.407307
3 C 0.411506
4 C 0.403108
5 C 0.376234
9/2/2016 1 A 0.470486
2 A 0.413605
3 A 0.417804
4 A 0.405207
5 A 0.329205
1 B 0.472912
2 B 0.409406
3 B 0.409406
4 B 0.415705
5 B 0.366436
第一个解决方案sort_values
:
df = df.sort_values(['Machine_number','Shift'], ascending=[True,False])
.reset_index(drop=True)
#if need change order of columns
df = df[['Date','Machine_number','Shift','production']]
print (df)
Date Machine_number Shift production
0 9/1/2016 1 C 0.795578
1 9/2/2016 1 B 0.472912
2 9/2/2016 1 A 0.470486
3 9/1/2016 2 C 0.407307
4 9/2/2016 2 B 0.409406
5 9/2/2016 2 A 0.413605
6 9/1/2016 3 C 0.411506
7 9/2/2016 3 B 0.409406
8 9/2/2016 3 A 0.417804
9 9/1/2016 4 C 0.403108
10 9/2/2016 4 B 0.415705
11 9/2/2016 4 A 0.405207
12 9/1/2016 5 C 0.376234
13 9/2/2016 5 B 0.366436
14 9/2/2016 5 A 0.329205
如果需要将订单更改为C, A, B
使用ordered Categorical
并在参数categories
中设置顺序:
df.Shift = df.Shift.astype('category', ordered=True, categories=['C','A','B'])
df = df.sort_values(['Machine_number','Shift']).reset_index(drop=True)
print (df)
Date Shift Machine_number production
0 9/1/2016 C 1 0.795578
1 9/2/2016 A 1 0.470486
2 9/2/2016 B 1 0.472912
3 9/1/2016 C 2 0.407307
4 9/2/2016 A 2 0.413605
5 9/2/2016 B 2 0.409406
6 9/1/2016 C 3 0.411506
7 9/2/2016 A 3 0.417804
8 9/2/2016 B 3 0.409406
9 9/1/2016 C 4 0.403108
10 9/2/2016 A 4 0.405207
11 9/2/2016 B 4 0.415705
12 9/1/2016 C 5 0.376234
13 9/2/2016 A 5 0.329205
14 9/2/2016 B 5 0.366436