Question

我正在用pandas中的以下数据框编写一个python脚本：

dog   dog     1   1   1   1   1   1   0   0   1   1
      fox     1   1   1   1   1   1   0   0   1   1
      the     1   1   1   1   1   1   1   0   1   1
      jumps   1   1   1   1   1   1   0   1   1   1
      over    1   1   1   1   1   1   0   0   1   1
fox   dog     1   1   1   1   1   1   0   0   1   1
      fox     1   1   1   1   1   1   0   0   1   1
      the     1   1   1   1   1   1   1   0   1   1
      jumps   1   1   1   1   1   1   0   1   1   1
      over    1   1   1   1   1   1   0   0   1   1
jumps dog     1   1   1   1   1   1   1   0   1   0
      fox     1   1   1   1   1   1   1   0   1   0
      the     1   0   1   1   1   1   0   0   1   0
      jumps   1   1   1   1   1   1   0   0   1   0
      over    1   0   1   1   1   0   0   1   1   0
over  dog     1   1   1   1   1   1   0   0   1   0
      fox     1   1   1   1   1   1   0   0   1   0
      the     1   0   1   1   1   0   0   1   1   0
      jumps   1   1   0   1   0   1   1   0   1   0
      over    1   1   1   1   1   1   0   0   1   0
the   dog     1   1   1   1   1   1   0   1   1   0
      fox     1   1   1   1   1   1   0   1   1   0
      the     1   1   1   1   1   1   0   0   1   0
      jumps   1   1   0   1   1   1   0   0   1   0
      over    1   1   0   1   0   1   1   0   1   0

这里我想在第一级或第二级行索引中删除包含单词'fox'的任何行，以便新数据帧变为：

dog   dog     1   1   1   1   1   1   0   0   1   1
      the     1   1   1   1   1   1   1   0   1   1
      jumps   1   1   1   1   1   1   0   1   1   1
      over    1   1   1   1   1   1   0   0   1   1
jumps dog     1   1   1   1   1   1   1   0   1   0
      the     1   0   1   1   1   1   0   0   1   0
      jumps   1   1   1   1   1   1   0   0   1   0
      over    1   0   1   1   1   0   0   1   1   0
over  dog     1   1   1   1   1   1   0   0   1   0
      the     1   0   1   1   1   0   0   1   1   0
      jumps   1   1   0   1   0   1   1   0   1   0
      over    1   1   1   1   1   1   0   0   1   0
the   dog     1   1   1   1   1   1   0   1   1   0
      the     1   1   1   1   1   1   0   0   1   0
      jumps   1   1   0   1   1   1   0   0   1   0
      over    1   1   0   1   0   1   1   0   1   0

如果我可以在单个查询中消除这样的多个单词，那将是有利的。例如'fox'和'over'。我尝试过使用df.xs和df.drop的组合，但似乎没有任何工作正常。有什么想法吗？

Answer 1

这是一个最小的例子：

df = pd.DataFrame([['dog', 'dog', 1], ['dog', 'fox', 1], ['dog', 'the', 1],
                   ['fox', 'dog', 0], ['fox', 'fox', 0], ['fox', 'the', 0],
                   ['jumps', 'dog', 1], ['jumps', 'fox', 1], ['jumps', 'the', 1]],
                  columns=['A', 'B', 'C'])

df = df.set_index(['A', 'B'])

#            C
# A     B     
# dog   dog  1
#       fox  1
#       the  1
# fox   dog  0
#       fox  0
#       the  0
# jumps dog  1
#       fox  1
#       the  1

def remover(df, lst):
    return df.drop(lst, level=0).drop(lst, level=1)

df = df.pipe(remover, ['fox', 'dog'])

#            C
# A     B     
# jumps the  1

Answer 2

如果您定义了列名（colname），则可能有效：

df = df.loc[(df.index != 'fox') & (df.colname != 'fox')]

或者，如果它是一个多索引数据框，通过重置索引，你可以这样做：

df = df.reset_index(drop=False)
df = df.loc[(df.index != 'fox') & (df.colname != 'fox')]

从数据帧中完全消除行索引及其行

2 个答案: