从多级熊猫数据框中删除行列表,其中前两个级别具有重复的索引

时间:2019-07-18 12:11:04

标签: pandas

我想从多级数据框中删除行索引的列表,其中前两级具有重复的条目。我想可以无循环地执行此操作,但是到目前为止,我还没有发现这一点。

我试图通过提供列表行索引组合来使用pd.drop函数,尽管这样做没有理想的效果。例如:

import numpy as np
import pandas as pd
def mklbl(prefix, n):
     return ["%s%s" % (prefix, i) for i in range(n)]


def src_rec(n, mult):
    src = [[no]*mult for no in range(1,n)]

    src = [item for sublist in src for item in sublist]

    rec = [no for no in range(1,n)]*mult

    return src, rec

src, rec = src_rec(4,4)


miindex = pd.MultiIndex.from_arrays([src*2,
                                   rec*2,
                                   mklbl('C', 24)])

dfmi = pd.DataFrame(np.arange(len(miindex) * 2)\
               .reshape((len(miindex), 2)),
                index=miindex)

我想删除所有具有索引值(1,2,:)和(2,3,:)的行

As = [1, 2]
Bs = [2, 3]

dfmi.drop(pd.MultiIndex.from_arrays([As,Bs]))

其结果是:

          0   1
1 1 C0    0   1
2 1 C18  36  37
  2 C19  38  39
3 3 C20  40  41
  1 C21  42  43
  2 C22  44  45
  3 C23  46  47

我想要的结果是:

          0   1
1 1 C0    0   1
  3 C2    4   5
  1 C3    6   7
2 2 C4    8   9
  1 C6   12  13
  2 C7   14  15
3 3 C8   16  17
  1 C9   18  19
  2 C10  20  21
  3 C11  22  23
1 1 C12  24  25
  3 C14  28  29
  1 C15  30  31
2 2 C16  32  33
  1 C18  36  37
  2 C19  38  39
3 3 C20  40  41
  1 C21  42  43
  2 C22  44  45
  3 C23  46  47

一个循环的例子是

for A, B in zip(As, Bs):
    dfmi_drop_idx = CCdata.loc[(A, B, slice(None)), :].index
    dfmi.drop(dfmi_drop_idx, inplace=True, errors='raise')

1 个答案:

答案 0 :(得分:1)

boolean indexing用于Index.isin的测试成员资格:

m = pd.MultiIndex.from_arrays([As,Bs])
df = dfmi[~dfmi.reset_index(level=2, drop=True).index.isin(m)]
print (df)
          0   1
1 1 C0    0   1
  3 C2    4   5
  1 C3    6   7
2 2 C4    8   9
  1 C6   12  13
  2 C7   14  15
3 3 C8   16  17
  1 C9   18  19
  2 C10  20  21
  3 C11  22  23
1 1 C12  24  25
  3 C14  28  29
  1 C15  30  31
2 2 C16  32  33
  1 C18  36  37
  2 C19  38  39
3 3 C20  40  41
  1 C21  42  43
  2 C22  44  45
  3 C23  46  47