Question

我的代码中有这样的内容：

df2 = df[df['A'].str.contains("Hello|World")]

但是，我希望不包含Hello或World的所有行。我如何最有效地扭转这种局面？

Answer 1

您可以使用代字号~来翻转bool值：

>>> df = pd.DataFrame({"A": ["Hello", "this", "World", "apple"]})
>>> df.A.str.contains("Hello|World")
0     True
1    False
2     True
3    False
Name: A, dtype: bool
>>> ~df.A.str.contains("Hello|World")
0    False
1     True
2    False
3     True
Name: A, dtype: bool
>>> df[~df.A.str.contains("Hello|World")]
       A
1   this
3  apple

[2 rows x 1 columns]

这是否是最有效的方式，我不知道;你必须把它与其他选择时间对准。有时候使用正则表达式会比df[~(df.A.str.contains("Hello") | (df.A.str.contains("World")))]更慢，但是我很难猜测交叉的位置。

Answer 2

.contains()方法使用正则表达式，因此您可以使用negative lookahead test来确定不包含的单词：

df['A'].str.contains(r'^(?:(?!Hello|World).)*$')

此表达式匹配字符串中任何位置找到Hello和World 字样的字符串。

演示：

>>> df = pd.DataFrame({"A": ["Hello", "this", "World", "apple"]})
>>> df['A'].str.contains(r'^(?:(?!Hello|World).)*$')
0    False
1     True
2    False
3     True
Name: A, dtype: bool
>>> df[df['A'].str.contains(r'^(?:(?!Hello|World).)*$')]
       A
1   this
3  apple

string.contains的逆转在python，pandas中

2 个答案: