使用多索引将数据帧的一部分与另一部分进行比较

时间:2021-04-21 21:45:51

标签: python pandas

我有一个带有 3 级多索引的数据框:

>>> np.random.seed(0)
>>> df = pd.DataFrame(np.random.randint(10, size=(18, 2)),
                      index=pd.MultiIndex.from_product([[True, False],
                                                        ['yes', 'no', 'maybe'],
                                                        ['one', 'two', 'three']],
                                                       names=['bool', 'ans', 'count']),
                      columns=['A', 'B'])
>>> df
                   A  B
bool  ans   count      
True  yes   one    5  0
            two    3  3
            three  7  9
      no    one    3  5
            two    2  4
            three  7  6
      maybe one    8  8
            two    1  6
            three  7  7
False yes   one    8  1
            two    5  9
            three  8  9
      no    one    4  3
            two    0  3
            three  5  0
      maybe one    2  3
            two    8  1
            three  3  3

我的目标是从具有相同 maybebool 的所有其他值中减去 count 值。减数是

>>> sub = df.loc[(slice(None), 'maybe', slice(None)), :]
>>> sub
                   A  B
bool  ans   count      
True  maybe one    8  8
            two    1  6
            three  7  7
False maybe one    2  3
            two    8  1
            three  3  3

问题是,当我尝试从其他项目中减去它时,索引与预期不匹配:

>>> df - sub
                     A    B
bool  ans   count          
False maybe one    0.0  0.0
            three  0.0  0.0
            two    0.0  0.0
      no    one    NaN  NaN
            three  NaN  NaN
            two    NaN  NaN
      yes   one    NaN  NaN
            three  NaN  NaN
            two    NaN  NaN
True  maybe one    0.0  0.0
            three  0.0  0.0
            two    0.0  0.0
      no    one    NaN  NaN
            three  NaN  NaN
            two    NaN  NaN
      yes   one    NaN  NaN
            three  NaN  NaN
            two    NaN  NaN

我想要的结果是

                   A   B
bool  ans   count
True  yes   one   -3  -8
            two    2  -3
            three  0   2
True  no    one   -5  -3
            two    1  -2
            three  0  -1
True  maybe one    0   0
            two    0   0
            three  0   0
False yes   one    6  -2
            two   -3   8
            three  5   6
False no    one    2   0
            two   -8   2
            three  2  -3
False maybe one    0   0
            two    0   0
            three  0   0

如何告诉熊猫遵守 boolcount 级别但忽略 ans 级别?

1 个答案:

答案 0 :(得分:2)

横截面 xs 在这里可能是一个不错的选择:

df.sub(
    df.xs('maybe', level=1)
).swaplevel().reindex(df.index)

输出:

                   A  B
bool  ans   count      
True  yes   one   -3 -8
            two    2 -3
            three  0  2
      no    one   -5 -3
            two    1 -2
            three  0 -1
      maybe one    0  0
            two    0  0
            three  0  0
False yes   one    6 -2
            two   -3  8
            three  5  6
      no    one    2  0
            two   -8  2
            three  2 -3
      maybe one    0  0
            two    0  0
            three  0  0