嗨,我有一个日志文件,文件内容如下:
[ 06-15 14:07:48.377 15012:15012 D/ViewRootImpl ]
ViewPostImeInputStage processKey 0
[ 06-15 14:07:48.397 3539: 4649 D/AudioService ]
active stream is 0x8
[ 06-15 14:07:48.407 4277: 4293 D/vol.VolumeDialogControl.VC ]
isSafeVolumeDialogShowing : false
我想从日志文件中提取一些信息。预期格式如下:
[('06-15 14:07:48.377', '15012', 'D', 'ViewRootImpl', 'ViewPostImeInputStage processKey 0'),
('06-15 14:07:48.397', '3539', '4649', 'D', 'AudioService', 'active stream is 0x8'),
('06-15 14:07:48.407', '4277', '4293', 'D', 'vol.VolumeDialogControl.VC', 'isSafeVolumeDialogShowing : false')]
问题:提取预期格式信息的最佳python正则表达式是什么?非常感谢!
upate:我已尝试过以下代码
import re
regex = r"(\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{3})\s(\d+).*(\w{1})/(.*)\](.*)"
data = [g.groups() for g in re.finditer(regex, log, re.M | re.I)]
我得到的结果是
data=[('06-15 14:07:48.377', '15012', 'D', 'ViewRootImpl', '\r'), (
'06-15 14:07:48.397', '3539', 'D', 'AudioService', '\r'), ('06-15 14:07:48.407',
'4277', 'D', 'vol.VolumeDialogControl.VC', '\r')]
我无法获得最后一个元素。
答案 0 :(得分:2)
使用以下方法:
with open('yourlogfile', 'r') as log:
lines = log.read()
result = re.sub(r'^\[ (\S+) *(\S+) *(\d+): *(\d+) *([A-Z]+)\/(\S+) \]\n([^\n]+)\n?',
r'\1 \2 \3 \4 \5 \6 \7', lines, flags=re.MULTILINE)
print(result)
输出:
06-15 14:07:48.377 15012 15012 D ViewRootImpl ViewPostImeInputStage processKey 0
06-15 14:07:48.397 3539 4649 D AudioService active stream is 0x8
06-15 14:07:48.407 4277 4293 D vol.VolumeDialogControl.VC isSafeVolumeDialogShowing : false
要将结果作为匹配列表使用re.findall()
函数:
...
result = re.findall(r'^\[ (\S+) *(\S+) *(\d+): *(\d+) *([A-Z]+)\/(\S+) \]\n([^\n]+)\n?', lines, flags=re.MULTILINE)
print(result)
输出:
[('06-15', '14:07:48.377', '15012', '15012', 'D', 'ViewRootImpl', 'ViewPostImeInputStage processKey 0'), ('06-15', '14:07:48.397', '3539', '4649', 'D', 'AudioService', 'active stream is 0x8'), ('06-15', '14:07:48.407', '4277', '4293', 'D', 'vol.VolumeDialogControl.VC', 'isSafeVolumeDialogShowing : false')]
答案 1 :(得分:1)
#!/usr/bin/python2
# -*- coding: utf-8 -*-
import re
input = """
[ 06-15 14:07:48.377 15012:15012 D/ViewRootImpl ]
ViewPostImeInputStage processKey 0
[ 06-15 14:07:48.397 3539: 4649 D/AudioService ]
active stream is 0x8
[ 06-15 14:07:48.407 4277: 4293 D/vol.VolumeDialogControl.VC ]
isSafeVolumeDialogShowing : false
"""
# remove carriage return
input = re.sub('(\])\s+', '\\1 ', input)
# replace D/Something ] -> D Something
input = re.sub('([A-Z]{1})/([^\s]+)\s+\]\s+', '\\1 \\2 ', input)
# remove first [
input = re.sub('\[\s+([0-9]{2}\-[0-9]{2})', '\\1', input)
print input
输出
06-15 14:07:48.377 15012:15012 D ViewRootImpl ViewPostImeInputStage processKey 0
06-15 14:07:48.397 3539: 4649 D AudioService active stream is 0x8
06-15 14:07:48.407 4277: 4293 D vol.VolumeDialogControl.VC isSafeVolumeDialogShowing : false