Question

我有一个像这样的字符串："sometext #Syrup #nshit #thebluntislit"

我希望获得以'＃'开头的所有字词的列表

我使用了以下代码：

import re
line = "blahblahblah #Syrup #nshit #thebluntislit"
ht = re.search(r'#\w*', line)
ht = ht.group(0)
print ht

我得到以下内容：

#Syrup

我想知道是否有一种方法可以改为获得如下列表：

[#Syrup,#nshit,#thebluntislit]

所有以“＃”开头的字词，而不仅仅是第一个字词。

Answer 1

Python等编程语言不需要正则表达式：

  hashed = [ word for word in line.split() if word.startswith("#") ]

Answer 2

您可以使用

compiled = re.compile(r'#\w*')
compiled.findall(line)

输出：

['#Syrup', '#nshit', '#thebluntislit']

但是有一个问题。如果您搜索'blahblahblah #Syrup #nshit #thebluntislit beg#end'之类的字符串，则输出将为['#Syrup', '#nshit', '#thebluntislit', '#end']。

使用正面lookbehind可以解决这个问题：

compiled = re.compile(r'(?<=\s)#\w*')

（此处无法使用\b（字边界），因为#不在\w符号[0-9a-zA-Z_]中，这可能构成正在搜索边界的字词）

Answer 3

看起来re.findall()会做你想做的事。

matches = re.findall(r'#\w*', line)