我有以下模式:
find_pattern = re.compile(r'(ga:country:\s)([a-zA-Z()\s]*)(.*users:\s)(\d+),')
这是应该匹配的输入的样子:
ga:country: (not set),Date range:0,ga:users:60,
ga:country: Albania,Date range:0,ga:users:7,
ga:country: Algeria,Date range:0,ga:users:10,
...
ga:country: Argentina,Date range:0,ga:users:61,
ga:country: Armenia,Date range:0,ga:users:2,
这是将如何格式化输出的格式(以防在问题中添加任何值):
['(not set)', 60],
['Albania', 7],
当我运行测试时:
matches = find_pattern.finditer(self.data)
print('matches:', matches)
for match in matches:
print(match)
没有找到匹配项。
希望有人能够提供帮助。
答案 0 :(得分:0)
我建议使用2个捕获组,而不是4个,在ga:
之后添加可选的空白字符,并在users:
之后将whitspace字符设置为可选
如果有更多的.*
部分,则.*?
也可以是非贪婪的users:
,以获得第一个。
为防止users:
以较大单词开头,您可以使其更具体地匹配:users:
\bga:\s*country:\s*([a-zA-Z()\s]*),.*?:users:(\d+)
示例,其中带有re.findall的值将返回捕获组的值:
import re
regex = r"\bga:\s*country:\s*([a-zA-Z()\s]*),.*?:users:(\d+)"
s = ("ga:country: (not set),Date range:0,ga:users:60,\n"
"ga:country: Albania,Date range:0,ga:users:7,\n"
"ga:country: Algeria,Date range:0,ga:users:10,\n"
"ga:country: Argentina,Date range:0,ga:users:61,\n"
"ga:country: Armenia,Date range:0,ga:users:2,")
print(re.findall(regex, s))
输出
[('(not set)', '60'), ('Albania', '7'), ('Algeria', '10'), ('Argentina', '61'), ('Armenia', '2')]