Question

我有以下模式：

find_pattern = re.compile(r'(ga:country:\s)([a-zA-Z()\s]*)(.*users:\s)(\d+),')

这是应该匹配的输入的样子：

        ga:country: (not set),Date range:0,ga:users:60,
        ga:country: Albania,Date range:0,ga:users:7,
        ga:country: Algeria,Date range:0,ga:users:10,
        ...
        ga:country: Argentina,Date range:0,ga:users:61,
        ga:country: Armenia,Date range:0,ga:users:2,

这是将如何格式化输出的格式（以防在问题中添加任何值）：

        ['(not set)', 60],
        ['Albania', 7],

当我运行测试时：

matches = find_pattern.finditer(self.data)
print('matches:', matches)
for match in matches:
    print(match)

没有找到匹配项。

希望有人能够提供帮助。

Answer 1

我建议使用2个捕获组，而不是4个，在ga:之后添加可选的空白字符，并在users:之后将whitspace字符设置为可选

如果有更多的.*部分，则.*?也可以是非贪婪的users:，以获得第一个。

为防止users:以较大单词开头，您可以使其更具体地匹配:users:

\bga:\s*country:\s*([a-zA-Z()\s]*),.*?:users:(\d+)

Regex demo

示例，其中带有re.findall的值将返回捕获组的值：

import re

regex = r"\bga:\s*country:\s*([a-zA-Z()\s]*),.*?:users:(\d+)"

s = ("ga:country: (not set),Date range:0,ga:users:60,\n"
    "ga:country: Albania,Date range:0,ga:users:7,\n"
    "ga:country: Algeria,Date range:0,ga:users:10,\n"
    "ga:country: Argentina,Date range:0,ga:users:61,\n"
    "ga:country: Armenia,Date range:0,ga:users:2,")

print(re.findall(regex, s))

输出

[('(not set)', '60'), ('Albania', '7'), ('Algeria', '10'), ('Argentina', '61'), ('Armenia', '2')]

为什么我的正则表达式不匹配？

1 个答案: