Question

我想用正则表达式提取一些模式。作为简化示例，我在下面有一些文本，我想在第二个字段中没有下划线的情况下提取文本：

main_opt.otherstuff应返回三个字段："main"，"opt"，"otherstuff"
main.otherstuff应返回三个字段："main"，""，"otherstuff"

如果我将正则表达式指定为^([^_]+)_?([^.]+)?\\.(.+)$，我可以得到这个。但是，我想知道我是否可以更改此模式_?([^.]+)?，其中只指定了一个?，因为它们属于同一个子模式。

我已尝试([^_]+)((?=_)[^.]+)?\\.(.+)$和([^_]+)((?:_)[^.]+)?\\.(.+)$，但他们为第二个字段返回“_opt”而不是“opt”。（如果重要的话，我正在使用python的re包。）

Answer 1

您可以将它们分组到非捕获组(?: ... )中，但它不会比原始解决方案更漂亮：

^([^_]+)(?:_([^.]+))?\\.(.+)$

请参阅Demo

在Python控制台上测试：

>>> re.findall(r'^([^_]+)(?:_([^.]+))?\.(.+)$', "main_opt.otherstuff")
[('main', 'opt', 'otherstuff')]
>>> re.findall(r'^([^_]+)(?:_([^.]+))?\.(.+)$', "main.otherstuff")
[('main', '', 'otherstuff')]

Answer 2

拆分可能会简化方法：

>>> re.split(r'_|\.', 'main_opt.otherstuff')
['main', 'opt', 'otherstuff']
>>> re.split(r'_|\.', 'main.otherstuff')
['main', 'otherstuff']

Answer 3

我认为只需拆分两次就可以更简单地做到这一点，而无需正则表达式。

mainopt, _, otherstuff = wholething.partition('.')
main, _, opt = mainopt.partition('_')

以下是对不同输入的作用：

`"main_opt.otherstuff"` -> `"main"`, `"opt"`, `"otherstuff"`
`"main.otherstuff"` -> `"main"`, `""`, `"otherstuff"`
`"main_opt"` -> `"main"`, `"opt"`, `""`

如果您想要不同的优先级，可以更改顺序，或者如果您想要不同的关联性，则可以用partition替换一个或其他rpartition个调用（例如，如果最后一个示例应该给出{{} 1}}，""，""）。

匹配正则表达式而不捕获

3 个答案: