Question

刚开始学习python / regex。

我有错误日志文件，我想捕获匹配特定模式的字符串，并从中创建一个列表。每行有一个错误。我把日期时间部分缩小了。我需要提取'company'和'errorline'，将它们分配给变量，附加到我的嵌套列表。

错误行看起来像这样：

2013-02-02 12:20:15 blahblahblah=123214, moreblah=1021, blah.blah.blah, company=201944, errorline=#2043

f = open("/path/error.log","r")

errorlist = [["datetime","company","errorline"]]     #I want to append to nested list

for line in f:
    datetime = line[:19]
    company = re.search(r"=[0-9]{6},",line)
    company = company.group[1:-1]                    #to remove the '=' and ','
    errorline = re.search(r"#[0-9]{1,}",line)
    errorline = errorline.group()[1:]

    errorlist.append([datetime,company,errorline])

我知道这段代码不起作用，因为我无法将.group（）分配给变量。

请帮忙！

Answer 1

它应该是：

company = re.search(r'=([0-9]{6}),',line).group(1)
errorline = re.search(r'#([0-9]{1,})',line).group(1)

注意括号，并致电.group。此外，您可以一起完成所有工作：

company, errorline = re.search(r'=([0-9]{6}),.*?#([0-9]{1,})',line).groups()

Answer 2

re.search返回Match Object

经典地，您的匹配代码应为：

match= re.search(r'(\d+)', 'abc 123 def')
if match:
    digits = match.group(1)
else:
    # react to no match

您还可以将示例中的两个匹配项压缩为一个（Demo），并且可以在此处看到：

>>> s='2013-02-02 12:20:15 blahblahblah=123214, moreblah=1021, blah.blah.blah, company=201944, errorline=#2043'
>>> match=re.search(r'^.*company=(\d+)\D+(\d+)', s)
>>> match.group(1)
'201944'
>>> match.group(2)
'2043'

然后你的代码的匹配部分就像：

match=re.search(r'^.*company=(\d+)\D+(\d+)', line)
if match:
    company=match.group(1)
    errorline=match.group(2)
    # do whatever with company and errorline
else:
    # react to an unexpected line format...

在Python上捕获与Regex的匹配并将捕获的字符串值分配给变量

2 个答案: