Question

我有一个语法相似的文字：

1_they 3_lens 2'' adapter (adjective)
1_we need 2_zoom6.0 3_system (adjective)
...

我需要将此文本分为4组：

以1_开头的所有字符串的分组（例如they，we need）
以2_开头的所有字符串的分组（例如zoom6.0）
以3_开头的所有字符串的分组（例如lens 2'' adapter，system）
最后括号内所有内容的组（例如adjective）

我想使用一个正则表达式，如果在该行中找不到该组，理想情况下该表达式应该为我提供None。 E.g：

>>> line = "1_they 3_lens adapter (adjective)"
>>> our_match = OUR_REGEXP.match(line)
>>> our_match.groups()
("they", None, "lens 2'' adapter", "adjective")

请注意，所有组都是可选的（括号中的最后一组除外），并且我还有要匹配的字符。

有什么建议吗？

Answer 1

这不是过于优雅，但几乎可以满足您的需求：

import re
regex = '1_(?P<one>\w+)|2_(?P<two>\w+)|3_(?P<three>[^\(\)]+)|(?P<bracket>\(\w+\))'

strg = "1_they 3_lens 2'' adapter (adjective)"

match = re.findall(regex, strg)
print(match)
res = [ ''.join(item)  for item in zip(*match) ]
print(res)

输出：

[('they', '', '', ''), ('', '', "lens 2'' adapter ", ''),
 ('', '', '', '(adjective)')]
['they', '', "lens 2'' adapter ", '(adjective)']

多个可选组的Python正则表达式

1 个答案: