Question

我有一个DataFrame，如下所示。如何选择第二个索引位于['two','three']？

的行

index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'],
                               ['one', 'two', 'three']],
                       labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
                               [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]])
hdf = DataFrame(np.random.randn(10, 3), index=index,
            columns=['A', 'B', 'C'])

In [3]: hdf
Out[3]: 
                  A         B         C
foo one   -1.274689  0.946294 -0.149131
    two   -0.015483  1.630099  0.085461
    three  1.396752 -0.272583 -0.760000
bar one   -1.151217  1.269658  2.457231
    two   -1.657258 -1.271384 -2.429598
baz two    1.124609  0.138720 -1.994984
    three  0.124298 -0.127099 -0.409736
qux one    0.535038  1.139026  0.414842
    two    0.287724  0.461041 -0.268918
    three -0.259649  0.226574 -0.558334

Answer 1

使用DataFrame的select方法的一种方法：

In [4]: hdf.select(lambda x: x[1] in ['two', 'three'])
Out[4]: 
                  A         B         C
foo two   -0.015483  1.630099  0.085461
    three  1.396752 -0.272583 -0.760000
bar two   -1.657258 -1.271384 -2.429598
baz two    1.124609  0.138720 -1.994984
    three  0.124298 -0.127099 -0.409736
qux two    0.287724  0.461041 -0.268918
    three -0.259649  0.226574 -0.558334

Answer 2

请注意，您也可以这样做：

In [9]: hdf.index.get_level_values(1).isin(['two', 'three'])
Out[9]: array([False,  True,  True, False,  True,  True,  True, False,  True,  True], dtype=bool)

确实应该有更好的语法。

基于子级别的分层索引

2 个答案: