Question

例如：

a = "bzzzzzz <!-- blabla --> blibli * bloblo * blublu"

我想抓住第一条评论。评论可能是

(<!-- .* -->) or (\* .* \*)

没关系：

re.search("<!--(?P<comment> .* )-->",a).group(1)

还有：

re.search("\*(?P<comment> .* )\*",a).group(1)

但如果我想要评论中的一个或另一个，我尝试过类似的事情：

re.search("(<!--(?P<comment> .* )-->|\*(?P<comment> .* )\*)",a).group(1)

但它不起作用

由于

Answer 1

尝试条件表达式：

>>> for m in re.finditer(r"(?:(<!--)|(\*))(?P<comment> .*? )(?(1)-->)(?(2)\*)", a):
...   print m.group('comment')
...
 blabla
 bloblo

Answer 2

正如Gurney指出的那样，你有两个同名的捕获。由于您实际上并未使用该名称，因此请将其删除。

此外，r""原始字符串符号是一个好习惯。

哦，第三件事：你抓错了指数。 0是完整匹配，1是整个“或 - 或”阻止，而2将是成功的内部捕获。

re.search(r"(<!--( .* )-->|\*( .* )\*)",a).group(2)

Answer 3

你在“不工作”部分得到的例外是非常明确的错误：

sre_constants.error: redefinition of group name 'comment' as group 3; was group 2

两个组都有相同的名称：只需重命名第二个

>>> re.search("(<!--(?P<comment> .* )-->|\*(?P<comment2> .* )\*)",a).group(1)
'<!-- blabla -->'
>>> re.search("(<!--(?P<comment> .* )-->|\*(?P<comment2> .* )\*)",a).groups()
('<!-- blabla -->', ' blabla ', None)
>>> re.findall("(<!--(?P<comment> .* )-->|\*(?P<comment2> .* )\*)",a)
[('<!-- blabla -->', ' blabla ', ''), ('* bloblo *', '', ' bloblo ')]

Answer 4

re.findall可能更适合这个：

import re

# Keep your regex simple. You'll thank yourself a year from now. Note that
# this doesn't include the surround spaces. It also uses non-greedy matching
# so that you can embed multiple comments on the same line, and it doesn't
# break on strings like '<!-- first comment --> fragment -->'.
pattern = re.compile(r"(?:<!-- (.*?) -->|\* (.*?) \*)")

inputstring = 'bzzzzzz <!-- blabla --> blibli * bloblo * blublu foo ' \
              '<!-- another comment --> goes here'

# Now use re.findall to search the string. Each match will return a tuple
# with two elements: one for each of the groups in the regex above. Pick the
# non-blank one. This works even when both groups are empty; you just get an
# empty string.
results = [first or second for first, second in pattern.findall(inputstring)]

Answer 5

你可以选择2种方法中的一种（如果Python支持） -

2：条件表达式（？（条件）是 - 模式|无模式）
(?:(|\*)这里的条件是我们捕获了grp1

修饰符sg单行和全局

Python正则表达式有两种评论

5 个答案: