Question

所以我从客户端获得这样的文件（下面显示4行）

Some text #instagram_h1 #instagram_h2 some more text #instagram_h3 more texts
Some text #instagram_h3 #instagram_h2 some more text #instagram_h1 more texts
Some text #instagram_h2 some more text #instagram_h3 more texts
Some text some more text #instagram_h3 more texts

我希望只搜索包含＃instagram_h3的行并丢弃包含＃instagram_h1和＃instagram_h2中的任何一个或两者的行。＃instagram_h3将永远存在。

我的尝试：

h1 = '#instagram_h1'
h2 = '#instagram_h2'
h3 = '#instagram_h3'
result = re.search(r"(!h1|!h2)", str)
print result

此处结果始终为无。任何人都可以解释一下，我做错了什么？

Answer 1

没有正则表达式!运算符。您可以做的是找到做包含这些字符串的行，然后排除它们。

if re.search(r"#instagram_(h1|h2)\b", str):
    # no good!

请注意我添加\b以防止匹配#instagram_h123之类的内容。

或者，对于像这样的简单搜索，您可以跳过正则表达式并直接检查子字符串。

if '#instagram_h1' in str or '#instagram_h2' in str:
    # no good!

# or

hashtags = ['#instagram_h1', '#instagram_h2']
if any(hashtag in str for hashtag in hashtags):
    # sorry!

请注意，这些简单的测试会匹配#instagram_123或#instagram_234，这可能不是您想要的。

搜索不包含＃instagram_h1和＃instagram_h2的行，但应包含＃instagram_h3

1 个答案: