Question

我想使用正则表达式提取参数（命令行参数的类型）。在这里，我将字符串作为输入并以组的形式获取参数

基本上我希望正则表达式中的集合既排除又包括一些字符。

import re

ppatt=r"( --(?P<param>([^( --)]*)))"
a=[x.group("param") for x in re.finditer(ppatt,"command --m=psrmcc;ld -  --kkk gtodf --klfj")]
print(a)

我希望输出为

['m=psrmcc;ld - ', 'kkk gtodf', 'klfj']

但输出是

['m=psrmcc;ld', 'kkk', 'klfj']

Answer 1

您可以使用re.split

例如：

import re

print(re.split(r"--", "command --m=psrmcc;ld -  --kkk gtodf --klfj")[1:])
#or
print("command --m=psrmcc;ld -  --kkk gtodf --klfj".split("--")[1:])

输出：

['m=psrmcc;ld -  ', 'kkk gtodf ', 'klfj']

Answer 2

我们也许可以使用带有单词边界的char列表来解决此问题，也许可以使用类似于以下内容的表达式：

(?:.+?)(\b[A-Za-z=;\s]+\b)

如果我们希望有更多的字符，我们将其添加到：

[A-Za-z=;\s]

在这里，我们不是通过使用非捕获组来捕获不需要的字符：

(?:.+?)

然后我们将所需的字符收集到一个捕获组中，我们可以简单地使用$1对其进行调用：

(\b[A-Za-z=;\s]+\b)

测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(?:.+?)(\b[A-Za-z=;\s]+\b)"

test_str = "command --m=psrmcc;ld -  --kkk gtodf --klfj"

subst = "\\1\\n"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

RegEx电路

jex.im可视化正则表达式：

如何使用正则表达式提取参数？

2 个答案:

测试

RegEx电路

DEMO