正则表达式,用于检查推文中是否存在主题标签

时间:2014-02-16 05:46:51

标签: python regex

我想检查以下推文中是否存在#python#conf个主题标签:

tweets = ['conferences you would like to attend #python #conf',
          'conferences you would like to attend #conf #python']

我已尝试过以下代码,但与推文不符。

import re
for tweet in tweets:
    if re.search(r'^(?=.*\b#python\b)(?=.*\b#conf\b).*$', tweet):
        print(tweet)

如果我从正则表达式中移除#符号,则两条推文都匹配,但它也会将推文与非主题标签pythonconf字段匹配。

1 个答案:

答案 0 :(得分:1)

\b匹配单词的开头或结尾。根据{{​​3}}

#不被视为单词
  

\b

     

匹配空字符串,但仅匹配单词的开头或结尾。   一个单词被定义为字母数字或下划线的序列   字符,所以单词的结尾用空格或a表示   非字母数字,非下划线字符。请注意,\ b是正式的   定义为\ w和\ W字符之间的边界(或副   反之亦然),或者在\ w和字符串的开头/结尾之间

尝试使用正则表达式(^.*$是不必要的):

(?=.*#python\b)(?=.*#conf\b)

>>> tweets = ['conferences you would like to attend #python #conf',
...           'conferences you would like to attend #conf #python',
...           'conferences you would like to attend #conf #snake']
>>>
>>> import re
>>> for tweet in tweets:
...     if re.search(r'(?=.*#python\b)(?=.*#conf\b)', tweet):
...         print(tweet)
...
conferences you would like to attend #python #conf
conferences you would like to attend #conf #python