我正在用pandas中的以下数据框编写一个python脚本:
dog dog 1 1 1 1 1 1 0 0 1 1
fox 1 1 1 1 1 1 0 0 1 1
the 1 1 1 1 1 1 1 0 1 1
jumps 1 1 1 1 1 1 0 1 1 1
over 1 1 1 1 1 1 0 0 1 1
fox dog 1 1 1 1 1 1 0 0 1 1
fox 1 1 1 1 1 1 0 0 1 1
the 1 1 1 1 1 1 1 0 1 1
jumps 1 1 1 1 1 1 0 1 1 1
over 1 1 1 1 1 1 0 0 1 1
jumps dog 1 1 1 1 1 1 1 0 1 0
fox 1 1 1 1 1 1 1 0 1 0
the 1 0 1 1 1 1 0 0 1 0
jumps 1 1 1 1 1 1 0 0 1 0
over 1 0 1 1 1 0 0 1 1 0
over dog 1 1 1 1 1 1 0 0 1 0
fox 1 1 1 1 1 1 0 0 1 0
the 1 0 1 1 1 0 0 1 1 0
jumps 1 1 0 1 0 1 1 0 1 0
over 1 1 1 1 1 1 0 0 1 0
the dog 1 1 1 1 1 1 0 1 1 0
fox 1 1 1 1 1 1 0 1 1 0
the 1 1 1 1 1 1 0 0 1 0
jumps 1 1 0 1 1 1 0 0 1 0
over 1 1 0 1 0 1 1 0 1 0
这里我想在第一级或第二级行索引中删除包含单词'fox'的任何行,以便新数据帧变为:
dog dog 1 1 1 1 1 1 0 0 1 1
the 1 1 1 1 1 1 1 0 1 1
jumps 1 1 1 1 1 1 0 1 1 1
over 1 1 1 1 1 1 0 0 1 1
jumps dog 1 1 1 1 1 1 1 0 1 0
the 1 0 1 1 1 1 0 0 1 0
jumps 1 1 1 1 1 1 0 0 1 0
over 1 0 1 1 1 0 0 1 1 0
over dog 1 1 1 1 1 1 0 0 1 0
the 1 0 1 1 1 0 0 1 1 0
jumps 1 1 0 1 0 1 1 0 1 0
over 1 1 1 1 1 1 0 0 1 0
the dog 1 1 1 1 1 1 0 1 1 0
the 1 1 1 1 1 1 0 0 1 0
jumps 1 1 0 1 1 1 0 0 1 0
over 1 1 0 1 0 1 1 0 1 0
如果我可以在单个查询中消除这样的多个单词,那将是有利的。例如'fox'和'over'。我尝试过使用df.xs和df.drop的组合,但似乎没有任何工作正常。有什么想法吗?
答案 0 :(得分:1)
这是一个最小的例子:
df = pd.DataFrame([['dog', 'dog', 1], ['dog', 'fox', 1], ['dog', 'the', 1],
['fox', 'dog', 0], ['fox', 'fox', 0], ['fox', 'the', 0],
['jumps', 'dog', 1], ['jumps', 'fox', 1], ['jumps', 'the', 1]],
columns=['A', 'B', 'C'])
df = df.set_index(['A', 'B'])
# C
# A B
# dog dog 1
# fox 1
# the 1
# fox dog 0
# fox 0
# the 0
# jumps dog 1
# fox 1
# the 1
def remover(df, lst):
return df.drop(lst, level=0).drop(lst, level=1)
df = df.pipe(remover, ['fox', 'dog'])
# C
# A B
# jumps the 1
答案 1 :(得分:0)
如果您定义了列名(colname),则可能有效:
df = df.loc[(df.index != 'fox') & (df.colname != 'fox')]
或者,如果它是一个多索引数据框,通过重置索引,你可以这样做:
df = df.reset_index(drop=False)
df = df.loc[(df.index != 'fox') & (df.colname != 'fox')]