我有一个日志文件,格式如下:
2016-02-18 10:01:45.423 [a-b] [one two three] [2126]
2016-02-18 10:01:45.623 [x-y] [one two three four] [123]
2016-02-18 10:01:45.823 [z-w] [one two three four-five] [0]
我想将字段拆分成变量,例如第一行:
Field1 = 2016-02-18
Field2 = 10:01:45.423
Field3 = a-b
Field4 =一二三
Field5 = 2126
我正在试图找出如何获得两个第一个字段,因为我设法获得了最后3个字段:
>>> import re
>>> data = """2016-02-18 10:01:45.423 [a-b] [one two three] [2126]"""
>>> PATTERN = re.compile(r'''\[(.*?)\]''')
>>> print (PATTERN.split(data)[1::2])
['a-b', 'one two three', '2126']
>>>
“Field4”的内容长度可能不同,Field2和Field3之间的分隔符为2x空格。
如何更改上面的代码以捕获每个字段?
谢谢!
答案 0 :(得分:0)
我不认为使用拆分是一个好主意(尽管它可以与您现有的模式一起使用)。为什么不用正确的捕获组制作正则表达式?
e.g。
data = r"2016-02-18 10:01:45.423 [a-b] [one two three] [2126]"
re.match(r"^([\d\-]*) ([\d:.]*) \[(.*)\] \[(.*)\] \[(.*)\]$", data).groups()
# gives ('2016-02-18', '10:01:45.423', 'a-b', 'one two three', '2126')
答案 1 :(得分:0)
也可以在没有正则表达式的情况下完成:
with open("your_log.log") as f:
for x in f:
fields = x.strip().split()
field1, filed2, field3, field4, field5 = fields[0], fields[1], fields[2], " ".join(fields[3:-1]), fields[-1]