我想检查以下推文中是否存在#python
和#conf
个主题标签:
tweets = ['conferences you would like to attend #python #conf',
'conferences you would like to attend #conf #python']
我已尝试过以下代码,但与推文不符。
import re
for tweet in tweets:
if re.search(r'^(?=.*\b#python\b)(?=.*\b#conf\b).*$', tweet):
print(tweet)
如果我从正则表达式中移除#
符号,则两条推文都匹配,但它也会将推文与非主题标签python
和conf
字段匹配。
答案 0 :(得分:1)
\b
匹配单词的开头或结尾。根据{{3}}
#
不被视为单词
\b
匹配空字符串,但仅匹配单词的开头或结尾。 一个单词被定义为字母数字或下划线的序列 字符,所以单词的结尾用空格或a表示 非字母数字,非下划线字符。请注意,\ b是正式的 定义为\ w和\ W字符之间的边界(或副 反之亦然),或者在\ w和字符串的开头/结尾之间
尝试使用正则表达式(^
,.*$
是不必要的):
(?=.*#python\b)(?=.*#conf\b)
>>> tweets = ['conferences you would like to attend #python #conf',
... 'conferences you would like to attend #conf #python',
... 'conferences you would like to attend #conf #snake']
>>>
>>> import re
>>> for tweet in tweets:
... if re.search(r'(?=.*#python\b)(?=.*#conf\b)', tweet):
... print(tweet)
...
conferences you would like to attend #python #conf
conferences you would like to attend #conf #python