Question

我试图在括号和引号之间提取Jenkinsfiles中每个触发器的值/参数（如果它们存在）。

例如，给出以下内容：

upstream(upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS)  # just parentheses
pollSCM('H * * * *')     # single quotes and parentheses

分别得到的结果：

upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS
H * * * *

我目前的结果：

upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS
H * * * *'        # Notice the trailing single quote

到目前为止，我已成功使用第一个触发器（上游一个），但不是第二个触发器（pollSCM），因为它仍然是一个尾随的单引号。

在群组(.+)之后，它不会使用\'*捕获尾随单引号，但它会使用\)捕获近似括号。我可以简单地使用.replace（）或.strip（）删除它，但我的正则表达式模式有什么问题？我怎样才能改进它？这是我的代码：

pattern = r"[A-Za-z]*\(\'*\"*(.+)\'*\"*\)"
text1 = r"upstream(upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS)"
text2 = r"pollSCM('H * * * *')"
trigger_value1 = re.search(pattern, text1).group(1)
trigger_value2 = re.search(pattern, text2).group(1)

Answer 1

import re
s = """upstream(upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS)  # just parentheses
pollSCM('H * * * *')"""
print(re.findall("\((.*?)\)", s))

<强>输出：

["upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS", "'H * * * *'"]

Answer 2

\'*部分代表0 or more matches代表您的单一勾号，因此.+会抓住最后'因为它贪婪。您需要将?添加到(.+)才能使其不贪婪。基本上它意味着抓住所有内容，直到遇到'。

此模式适用于您： [A-Za-z]*\(\'*\"*(.+?)\'*\"*\)

[UPDATE]

要回答下面的问题，我只需将其添加到此处。

So the ? will make it not greedy up until the next character indicated in the pattern?

是的，它基本上改变了重复运算符而不是贪婪（懒惰量词），因为它们默认是贪婪的。因此，.*?a将匹配所有内容，直到第一个a，而.*a将匹配包括字符串中找到的任何a在内的所有内容，直到它不再与字符串匹配为止。因此，如果您的字符串为aaaaaaaa且您的正则表达式为.*?a，那么它实际上将匹配每个a。例如，如果您对字符串.*?a上的每个匹配使用b替换aaaaaaaa，则会得到字符串bbbbbbbb。 .*a但是对于具有相同替换的字符串aaaaaaaa，将返回单个b。

这是一个解释不同量词类型（贪婪，懒惰，占有欲）的链接：http://www.rexegg.com/regex-quantifiers.html

Answer 3

对于示例数据，您可以使'可选'?并在组中捕获您的值，然后循环捕获的组。

\('?(.*?)'?\)

test_str = ("upstream(upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS)  # just parentheses\n"
    "pollSCM('H * * * *')     # single quotes and parentheses")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches):    
    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1  
        print (match.group(groupNum))

Demo Python

那会给你：

upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS
H * * * *

要获得更严格的匹配，您可以使用替换在()或('')之间匹配，但不能与'之类的('H * * * *)匹配，然后循环捕获组。因为您现在捕获2个组，其中2个中的1个为空，您可以检查您是否只检索非空组。

\((?:'(.*?)'|([^'].*?[^']))\)

Demo Python

Python Regex：如何在括号和引号之间提取字符串

3 个答案: