Question

我有一个日志文件，格式如下：

2016-02-18 10:01:45.423  [a-b] [one two three] [2126]
2016-02-18 10:01:45.623  [x-y] [one two three four] [123]
2016-02-18 10:01:45.823  [z-w] [one two three four-five] [0]

我想将字段拆分成变量，例如第一行：

Field1 = 2016-02-18

Field2 = 10：01：45.423

Field3 = a-b

Field4 =一二三

Field5 = 2126

我正在试图找出如何获得两个第一个字段，因为我设法获得了最后3个字段：

>>> import re
>>> data = """2016-02-18 10:01:45.423  [a-b] [one two three] [2126]"""
>>> PATTERN = re.compile(r'''\[(.*?)\]''')
>>> print (PATTERN.split(data)[1::2])
['a-b', 'one two three', '2126']
>>>

“Field4”的内容长度可能不同，Field2和Field3之间的分隔符为2x空格。

如何更改上面的代码以捕获每个字段？

谢谢！

Answer 1

我不认为使用拆分是一个好主意（尽管它可以与您现有的模式一起使用）。为什么不用正确的捕获组制作正则表达式？

e.g。

data = r"2016-02-18 10:01:45.423  [a-b] [one two three] [2126]"
re.match(r"^([\d\-]*) ([\d:.]*)  \[(.*)\] \[(.*)\] \[(.*)\]$", data).groups()
# gives ('2016-02-18', '10:01:45.423', 'a-b', 'one two three', '2126')

Answer 2

也可以在没有正则表达式的情况下完成：

with open("your_log.log") as f:
    for x in f:
        fields = x.strip().split()
        field1, filed2, field3, field4, field5 = fields[0], fields[1], fields[2], " ".join(fields[3:-1]), fields[-1]

Python从日志文件中匹配多个模式

2 个答案: