Question

我想检查主题标签后面是否是常规文本或python字符串中的其他主题标签。例如：

"my adjectives names #Day #Night which are in the description"

，我得到了假的，因为在第一个＃标签之后又出现了＃标签。但是在其他情况下

"my adjectives names #Day which is in the description"

我会成真。如何在Python中使用正则表达式操作做到这一点？

我尝试过：

tweet_text = "my adjectives names #Day #Night which are in the description"
pattern = re.findall(r'\B#\w*[a-zA-Z0-9]+\B#\w*[a-zA-Z0-9]*', tweet_text)
print(pattern)

但是它没有任何输出。

Answer 1

解释器的示例：

>>> import re
>>> pat = re.compile(r'(#\w+\s+){2,}')
>>>
>>> text = 'my adjectives names #Day  which are in the description'
>>> pat.search(text)
>>>
>>> text = 'my adjectives names #Day #Night which are in the description'
>>> pat.search(text)
<_sre.SRE_Match object; span=(20, 32), match='#Day #Night '>

Answer 2

对于不是后跟另一个主题标签的主题标签，请使用：

input = "my adjectives names #Day #Night which are in the description"
matches = re.findall(r'#[^#\s]+\b(?!\s+#[^#]+)', input)
print(matches)

['#Night']

对于是后跟另一个标签的标签，只需将正的负标签替换为正标签：

matches = re.findall(r'#[^#\s]+\b(?=\s+#[^#]+)', input)
print(matches)

['#Day']

标签之后是常规文字

2 个答案: