Question

我有一个带索引（year，foo）的数据框，我想在foo中选择year == someYear的X最大观察值。

我的方法是

df.sort_index(level=[0, 1], ascending=[1, 0], inplace=True)
df.loc[pd.IndexSlice[2002, :10], :]

但我得到

KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'

我尝试了不同的排序变体（例如ascending = [0, 0]），但它们都导致了某种错误。

如果我只想要xth行，我可以在排序后df.groupby(level=[0]).nth(x)，但因为我想要一组行，所以效率不高。

选择这些行的最佳方法是什么？一些数据：

                   rank_int  rank
year foo                         
2015 1.381845             2   320
     1.234795             2   259
     1.148488           199     2
     0.866704             2   363
     0.738022             2   319

Answer 1

首先你应该像这样排序：

df.sort_index(level=['year','foo'], ascending=[1, 0], inplace=True)

它应该修复KeyError。但是df.loc[pd.IndexSlice[2002, :10], :]不会给你你想要的结果。 loc函数不是iloc，它会尝试在foo索引0,1..9中查找。 Multiindex的次要级别不支持iloc，我建议使用groupby。如果你已经拥有这个多索引，你应该这样做：

df.reset_index()
df = df.sort_values(by=['year','foo'],ascending=[True,False])
df.groupby('year').head(10)

如果您需要n个foo最少的条目，则可以使用tail(n)。如果您需要，例如，第一，第三和第五个条目，您可以使用问题中提到的nth([0,2,4])。我认为这是最有效的方式。

Answer 2

ascending should be a boolean, not a list。尝试以这种方式排序：

df.sort_index(ascending=True, inplace=True)

Answer 3

对我而言，它使用sort_index(axis=1)：

df = df.sort_index(axis=1)

执行此操作后，您可以使用slice或pandas.IndexSlice，例如：

df.loc[:, idx[:, 'A']]

Answer 4

要获得所需的第二级xth观察，可以将loc与iloc合并：

df.sort_index(level=[0, 1], ascending=[1, 0], inplace=True)
df.loc[2015].iloc[:10]

按预期工作。这不能回答锁定w.r.t的怪异索引。然而，lexsorting。

MultiIndex Slicing要求索引完全被lexsorted

4 个答案: