我有一个像这样的多索引数据集:
mean std
Happiness Score Happiness Score
Region
Australia and New Zealand 7.302500 0.020936
Central and Eastern Europe 5.371184 0.578274
Eastern Asia 5.632333 0.502100
Latin America and Caribbean 6.069074 0.728157
Middle East and Northern Africa 5.387879 1.031656
North America 7.227167 0.179331
Southeastern Asia 5.364077 0.882637
Southern Asia 4.590857 0.535978
Sub-Saharan Africa 4.150957 0.584945
Western Europe 6.693000 0.777886
我想按标准偏差对其进行排序。
我的尝试
import numpy as np
import pandas as pd
df1.sort_values(by=('Region','std'))
如何解决该问题?
答案 0 :(得分:1)
设置
np.random.seed(0)
df = pd.DataFrame(np.random.choice(10, (5, 2)))
df.columns = pd.MultiIndex.from_arrays([['mean', 'std'], ['Happiness Score'] * 2])
df
mean std
Happiness Score Happiness Score
0 5 0
1 3 3
2 7 9
3 3 5
4 2 4
您可以使用argsort
并为df
重新编制索引:
df.loc[:, ('std', 'Happiness Score')].argsort().values
# array([0, 1, 4, 3, 2])
df.iloc[df.loc[:, ('std', 'Happiness Score')].argsort().values]
# df.iloc[np.argsort(df.loc[:, ('std', 'Happiness Score')])]
mean std
Happiness Score Happiness Score
0 5 0
1 3 3
4 2 4
3 3 5
2 7 9
另一种解决方案是sort_values
,并传递一个元组:
df.sort_values(by=('std', 'Happiness Score'), axis=0)
mean std
Happiness Score Happiness Score
0 5 0
1 3 3
4 2 4
3 3 5
2 7 9
我认为您的想法正确,但是元组的顺序不正确。