Question

我需要制作正则表达式，它将捕获以下内容：

Fixed unicode text:
<br>
<strong>
   text I am looking for
</strong>

我喜欢

regex = re.compile(unicode('Fixed unicode text:.*','utf-8'))

如何修改它以捕获剩余文本？

Answer 1

简单地在u前缀（在Python 2.x中，在Python 3中没有任何内容）来获取unicode字符串，并使用括号来捕获剩余的文本，如下所示：

import re
haystack = u'Fixed unicode text:\n<br><strong>\ntext I\nam looking for</strong>'
match = re.search(ur'Fixed unicode text:(.*)', haystack, re.DOTALL)
print(match.group(1))

但是，您的输入看起来像是HTML。如果是这种情况，您应not使用正则表达式，但使用lxml，BeautifulSoup或其他HTML解析器解析HTML。

如何使用unicode符号制作正则表达式？

1 个答案: