Question

我已经设置了一个带有两个索引的DataFrame。但切片的行为并不像预期的那样。我意识到这是一个非常基本的问题，所以我搜索了类似的问题：

pandas: slice a MultiIndex by range of secondary index

Python Pandas slice multiindex by second level index (or any other level)

我还查看了相应的documentation

奇怪的是，所提议的解决方案都不适合我。我已经设置了一个简单的例子来展示问题：

# this is my DataFrame
frame = pd.DataFrame([
{"a":1, "b":1, "c":"11"},
{"a":1, "b":2, "c":"12"},
{"a":2, "b":1, "c":"21"},
{"a":2, "b":2, "c":"22"},
{"a":3, "b":1, "c":"31"},
{"a":3, "b":2, "c":"32"}])

# now set a and b as multiindex
frame = frame.set_index(["a","b"])

现在我正在尝试不同的切片方式。前两行有效，第三行抛出异常：

# selecting a specific cell works
frame.loc[1,2]

# slicing along the second index works
frame.loc[1,:]

# slicing along the first doesn't work
frame.loc[:,1]

这是一个TypeError：

TypeError: cannot do label indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [1] of <class 'int'>

解决方案1：使用切片元组

这是在这个问题中提出的：pandas: slice a MultiIndex by range of secondary index

实际上，您可以为每个级别传递一个切片

但这对我不起作用，会产生与上面相同的类型错误。

frame.loc[(slice(1,2), 1)]

解决方案2：使用IndexSlice

Python Pandas slice multiindex by second level index (or any other level)

使用索引器切片任意维度的任意值

同样，这对我不起作用，它会产生相同的类型错误。

frame.loc[pd.IndexSlice[:,2]]

我不明白如何制作这种类型的错误。毕竟我可以使用整数来选择特定的单元格，并且沿着第二维度的范围可以正常工作。谷歌搜索我的具体错误消息并没有真正帮助。例如，这里有人试图使用整数沿float类型的索引进行切片：https://github.com/pandas-dev/pandas/issues/12333

我尝试将我的索引显式转换为int，也许numpy后端默认将所有内容存储为float？但这并没有改变任何东西，之后出现了与上面相同的错误：

frame["a"]=frame["a"].apply(lambda x : int(x))
frame["b"]=frame["b"].apply(lambda x : int(x))

type(frame["b"][0])  # it's numpy.int64

Answer 1

IIUC在索引多索引DF时，您只需为列指定:：

In [40]: frame.loc[pd.IndexSlice[:,2], :]
Out[40]:
      c
a b
1 2  12
2 2  22
3 2  32

熊猫：沿着多指数的第一级切片

1 个答案: