Question

我需要构建一个可以读取多行代码的程序，并从每行中提取正确的信息。示例文本：

no matches
one match <'found'>
<'one'> match <found>
<'three'><'matches'><'found'>

对于这种情况，该程序应检测<'found'>，<'one'>，<'three'>，<'matches'>和<'found'>作为匹配项，因为它们都有“＆lt;”和“'”。但是，我无法使用正则表达式来计算同一行上的多个匹配项。我使用的是：

re.search('^<.*>$')

但如果一行中有多个匹配项，则额外的“'<”和“>'”将作为.*的一部分，而不会将它们视为单独的匹配项。我该如何解决这个问题？

Answer 1

这有效 -

>>> r = re.compile(r"\<\'.*?\'\>")
>>> r.findall(s)
["<'found'>", "<'one'>", "<'three'>", "<'matches'>", "<'found'>"]

Answer 2

您可以使用re.findall并匹配尖括号内的非>个字符：

>>> re.findall('<[^>]*>', "<'three'><'matches'><'found'>")
["<'three'>", "<'matches'>", "<'found'>"]

非贪婪量词'？'正如anubhava所建议的那样也是一种选择。

Answer 3

使用findall代替search：

re.findall( r"<'.*?'>", str )