熊猫MultiIndex Series麻烦中的级别排序子集

时间:2019-10-18 08:31:25

标签: python-3.x pandas sorting multi-index

我有一个三级多重索引的系列:

print(ser_test):
                            Value
Date       Group Country         
2014-01-31 3     AE       example
                 AR       example
2014-02-28 3     AE       example
                 AR       example
2014-03-31 3     AE       example
                 AR       example
2014-04-30 3     AE       example
                 AR       example
2014-05-30 3     AR       example
2014-06-30 2     AE       example
           3     AR       example
2014-07-31 2     AE       example
           3     AR       example
2014-08-29 2     AE       example
           3     AR       example
2014-09-30 2     AE       example
           3     AR       example
2014-10-31 2     AE       example
           3     AR       example
2014-11-28 2     AE       example
           3     AR       example
2014-12-31 2     AE       example
           3     AR       example

我的目标是先按国家/地区对系列进行排序,然后按日期进行排序,而忽略 Group 级别以获得下一个结果:

                            Value
Date       Group Country         
2014-01-31 3     AE       example
2014-02-28 3     AE       example
2014-03-31 3     AE       example
2014-04-30 3     AE       example
2014-06-30 2     AE       example
2014-07-31 2     AE       example
2014-08-29 2     AE       example
2014-09-30 2     AE       example
2014-10-31 2     AE       example
2014-11-28 2     AE       example
2014-12-31 2     AE       example
2014-01-31 3     AR       example
2014-02-28 3     AR       example
2014-03-31 3     AR       example
2014-04-30 3     AR       example
2014-05-30 3     AR       example
2014-06-30 3     AR       example
2014-07-31 3     AR       example
2014-08-29 3     AR       example
2014-09-30 3     AR       example
2014-10-31 3     AR       example
2014-11-28 3     AR       example
2014-12-31 3     AR       example

我还需要 Group 级,所以我不能简单地消除它。

所以我尝试使用像这样的sort_index方法:

print(ser_test.sort_index(level = ['Country', 'Date']))

或类似的话:

print(ser_test.sort_index(level = ['Country', 'Date'], sort_remaining = False))

在这两种情况下,我都收到一个结果,其中 Group 级别参与了排序过程,并且在 Date 级别之前具有优先级:

                            Value
Date       Group Country         
2014-06-30 2     AE       example
2014-07-31 2     AE       example
2014-08-29 2     AE       example
2014-09-30 2     AE       example
2014-10-31 2     AE       example
2014-11-28 2     AE       example
2014-12-31 2     AE       example
2014-01-31 3     AE       example
2014-02-28 3     AE       example
2014-03-31 3     AE       example
2014-04-30 3     AE       example
2014-01-31 3     AR       example
2014-02-28 3     AR       example
2014-03-31 3     AR       example
2014-04-30 3     AR       example
2014-05-30 3     AR       example
2014-06-30 3     AR       example
2014-07-31 3     AR       example
2014-08-29 3     AR       example
2014-09-30 3     AR       example
2014-10-31 3     AR       example
2014-11-28 3     AR       example
2014-12-31 3     AR       example

我尝试使用sort_index的所有选项,并通过这段代码取得了意外的成功:

print(ser_test.sort_index(level = ['Country', 'Date'], ascending = [True, True]))

                            Value
Date       Group Country         
2014-01-31 3     AE       example
2014-02-28 3     AE       example
2014-03-31 3     AE       example
2014-04-30 3     AE       example
2014-06-30 2     AE       example
2014-07-31 2     AE       example
2014-08-29 2     AE       example
2014-09-30 2     AE       example
2014-10-31 2     AE       example
2014-11-28 2     AE       example
2014-12-31 2     AE       example
2014-01-31 3     AR       example
2014-02-28 3     AR       example
2014-03-31 3     AR       example
2014-04-30 3     AR       example
2014-05-30 3     AR       example
2014-06-30 3     AR       example
2014-07-31 3     AR       example
2014-08-29 3     AR       example
2014-09-30 3     AR       example
2014-10-31 3     AR       example
2014-11-28 3     AR       example
2014-12-31 3     AR       example

这很奇怪,我不确定这是否是获得有保证的预期排序结果的通用方法,而使用MultiIndex 对我来说是至关重要的选择。

那么,您能帮助我理解sort_index原理并为我分享一些针对这种特殊情况的代码吗?

1 个答案:

答案 0 :(得分:1)

您可以尝试升级到最新版本的熊猫,并在pandas 0.25.0中进行测试,并且运行良好:

print(df.sort_index(level = ['Country', 'Date']))
                            Value
Date       Group Country         
2014-01-31 3     AE       example
2014-02-28 3     AE       example
2014-03-31 3     AE       example
2014-04-30 3     AE       example
2014-06-30 2     AE       example
2014-07-31 2     AE       example
2014-08-29 2     AE       example
2014-09-30 2     AE       example
2014-10-31 2     AE       example
2014-11-28 2     AE       example
2014-12-31 2     AE       example
2014-01-31 3     AR       example
2014-02-28 3     AR       example
2014-03-31 3     AR       example
2014-04-30 3     AR       example
2014-05-30 3     AR       example
2014-06-30 3     AR       example
2014-07-31 3     AR       example
2014-08-29 3     AR       example
2014-09-30 3     AR       example
2014-10-31 3     AR       example
2014-11-28 3     AR       example
2014-12-31 3     AR       example