如何在熊猫Multindex中选择级别组合?

时间:2017-09-23 18:08:51

标签: python pandas

我有以下数据框:

import numpy as np
import pandas as pd

index = pd.MultiIndex.from_product([[1, 2], ['a', 'b', 'c'], ['a', 'b', 'c']],
                                   names=['one', 'two', 'three'])

df = pd.DataFrame(np.random.rand(18, 3), index=index)

            0           1           2
one two three           
1   a   b   0.002568    0.390393    0.040717
        c   0.943853    0.105594    0.738587
    b   b   0.049197    0.500431    0.001677
        c   0.615704    0.051979    0.191894
2   a   b   0.748473    0.479230    0.042476
        c   0.691627    0.898222    0.252423
    b   b   0.270330    0.909611    0.085801
        c   0.913392    0.519698    0.451158

我想选择索引级别twothree的组合为(a, b)(b, c)的行。我怎么能这样做?

我尝试了df.loc[(slice(None), ['a', 'b'], ['b', 'c']), :],但这给了我[a, b][b, c]的所有组合,包括(a, c)(b, b),这些都是不需要的。

我尝试df.loc[pd.MultiIndex.from_tuples([(None, 'a', 'b'), (None, 'b', 'c')])],但会在索引的NaN级返回one

df.loc[pd.MultiIndex.from_tuples([(None, 'a', 'b'), (None, 'b', 'c')])]

            0   1   2
NaN a   b   NaN NaN NaN
    b   c   NaN NaN NaN

所以我认为我需要在级别one处获得一个切片,但这会给我一个TypeError

pd.MultiIndex.from_tuples([(slice(None), 'a', 'b'), (slice(None), 'b', 'c')])

TypeError: unhashable type: 'slice'

我觉得我在这里错过了一些简单的单行:)。

2 个答案:

答案 0 :(得分:2)

使用df.query()

In [174]: df.query("(two=='a' and three=='b') or (two=='b' and three=='c')")
Out[174]:
                      0         1         2
one two three
1   a   b      0.211555  0.193317  0.623895
    b   c      0.685047  0.369135  0.899151
2   a   b      0.082099  0.555929  0.524365
    b   c      0.901859  0.068025  0.742212

更新:我们还可以动态生成此类“查询”:

In [185]: l = [('a','b'), ('b','c')]

In [186]: q = ' or '.join(["(two=='{}' and three=='{}')".format(x,y) for x,y in l])

In [187]: q
Out[187]: "(two=='a' and three=='b') or (two=='b' and three=='c')"

In [188]: df.query(q)
Out[188]:
                      0         1         2
one two three
1   a   b      0.211555  0.193317  0.623895
    b   c      0.685047  0.369135  0.899151
2   a   b      0.082099  0.555929  0.524365
    b   c      0.901859  0.068025  0.742212

答案 1 :(得分:1)

这是locget_level_values

的一种方法
In [3231]: idx = df.index.get_level_values

In [3232]: df.loc[((idx('two') == 'a') & (idx('three') == 'b')) |
                  ((idx('two') == 'b') & (idx('three') == 'c'))]
Out[3232]:
                      0         1         2
one two three
1   a   b      0.442332  0.380669  0.832598
    b   c      0.458145  0.017310  0.068655
2   a   b      0.933427  0.148962  0.569479
    b   c      0.727993  0.172090  0.384461

通用方式

In [3262]: conds = [('a', 'b'), ('b', 'c')]

In [3263]: mask = np.column_stack(
                      [(idx('two') == c[0]) & (idx('three') == c[1]) for c in conds]
                    ).any(1)

In [3264]: df.loc[mask]
Out[3264]:
                      0         1         2
one two three
1   a   b      0.442332  0.380669  0.832598
    b   c      0.458145  0.017310  0.068655
2   a   b      0.933427  0.148962  0.569479
    b   c      0.727993  0.172090  0.384461