我有一个三级多重索引的系列:
print(ser_test):
Value
Date Group Country
2014-01-31 3 AE example
AR example
2014-02-28 3 AE example
AR example
2014-03-31 3 AE example
AR example
2014-04-30 3 AE example
AR example
2014-05-30 3 AR example
2014-06-30 2 AE example
3 AR example
2014-07-31 2 AE example
3 AR example
2014-08-29 2 AE example
3 AR example
2014-09-30 2 AE example
3 AR example
2014-10-31 2 AE example
3 AR example
2014-11-28 2 AE example
3 AR example
2014-12-31 2 AE example
3 AR example
我的目标是先按国家/地区对系列进行排序,然后按日期进行排序,而忽略 Group 级别以获得下一个结果:
Value
Date Group Country
2014-01-31 3 AE example
2014-02-28 3 AE example
2014-03-31 3 AE example
2014-04-30 3 AE example
2014-06-30 2 AE example
2014-07-31 2 AE example
2014-08-29 2 AE example
2014-09-30 2 AE example
2014-10-31 2 AE example
2014-11-28 2 AE example
2014-12-31 2 AE example
2014-01-31 3 AR example
2014-02-28 3 AR example
2014-03-31 3 AR example
2014-04-30 3 AR example
2014-05-30 3 AR example
2014-06-30 3 AR example
2014-07-31 3 AR example
2014-08-29 3 AR example
2014-09-30 3 AR example
2014-10-31 3 AR example
2014-11-28 3 AR example
2014-12-31 3 AR example
我还需要 Group 级,所以我不能简单地消除它。
所以我尝试使用像这样的sort_index方法:
print(ser_test.sort_index(level = ['Country', 'Date']))
或类似的话:
print(ser_test.sort_index(level = ['Country', 'Date'], sort_remaining = False))
在这两种情况下,我都收到一个结果,其中 Group 级别参与了排序过程,并且在 Date 级别之前具有优先级:
Value
Date Group Country
2014-06-30 2 AE example
2014-07-31 2 AE example
2014-08-29 2 AE example
2014-09-30 2 AE example
2014-10-31 2 AE example
2014-11-28 2 AE example
2014-12-31 2 AE example
2014-01-31 3 AE example
2014-02-28 3 AE example
2014-03-31 3 AE example
2014-04-30 3 AE example
2014-01-31 3 AR example
2014-02-28 3 AR example
2014-03-31 3 AR example
2014-04-30 3 AR example
2014-05-30 3 AR example
2014-06-30 3 AR example
2014-07-31 3 AR example
2014-08-29 3 AR example
2014-09-30 3 AR example
2014-10-31 3 AR example
2014-11-28 3 AR example
2014-12-31 3 AR example
我尝试使用sort_index的所有选项,并通过这段代码取得了意外的成功:
print(ser_test.sort_index(level = ['Country', 'Date'], ascending = [True, True]))
Value
Date Group Country
2014-01-31 3 AE example
2014-02-28 3 AE example
2014-03-31 3 AE example
2014-04-30 3 AE example
2014-06-30 2 AE example
2014-07-31 2 AE example
2014-08-29 2 AE example
2014-09-30 2 AE example
2014-10-31 2 AE example
2014-11-28 2 AE example
2014-12-31 2 AE example
2014-01-31 3 AR example
2014-02-28 3 AR example
2014-03-31 3 AR example
2014-04-30 3 AR example
2014-05-30 3 AR example
2014-06-30 3 AR example
2014-07-31 3 AR example
2014-08-29 3 AR example
2014-09-30 3 AR example
2014-10-31 3 AR example
2014-11-28 3 AR example
2014-12-31 3 AR example
这很奇怪,我不确定这是否是获得有保证的预期排序结果的通用方法,而使用MultiIndex 对我来说是至关重要的选择。
那么,您能帮助我理解sort_index原理并为我分享一些针对这种特殊情况的代码吗?
答案 0 :(得分:1)
您可以尝试升级到最新版本的熊猫,并在pandas 0.25.0中进行测试,并且运行良好:
print(df.sort_index(level = ['Country', 'Date']))
Value
Date Group Country
2014-01-31 3 AE example
2014-02-28 3 AE example
2014-03-31 3 AE example
2014-04-30 3 AE example
2014-06-30 2 AE example
2014-07-31 2 AE example
2014-08-29 2 AE example
2014-09-30 2 AE example
2014-10-31 2 AE example
2014-11-28 2 AE example
2014-12-31 2 AE example
2014-01-31 3 AR example
2014-02-28 3 AR example
2014-03-31 3 AR example
2014-04-30 3 AR example
2014-05-30 3 AR example
2014-06-30 3 AR example
2014-07-31 3 AR example
2014-08-29 3 AR example
2014-09-30 3 AR example
2014-10-31 3 AR example
2014-11-28 3 AR example
2014-12-31 3 AR example