请原谅我,如果这不是提出这样一个问题的适当场所,但我正在努力想出一个可行的方法来分割一些文本。
以下是我尝试拆分的文字示例:
[Thu Feb 2 12:45:38 2017][428423.3] (file_name:0xcb61) Invalid variable type
call stack:
-----------
[0cb61:+33] larray, r#26, fp(3),
[031ff:Mug::Request.preHandlers+17] refcall, fp(1), string#245, # from: fp(1)
[0339d:Mug::Request.process+77] call, addr(0x80001d), -, # Mug::Request.preHandlers()
[02ffd:Mug::Request.recv+93] call, addr(0x800026), -, # Mug::Request.process()
[02d03:Mug::Connection.on_client+101] refcall, fp(0), string#734, # from: fp(0)
[14a5b:+4] refcall, fp(-2), string#3103, # from: fp(-2)
[1e24a:main+9664] eop, -, -,
[Thu Feb 2 14:09:07 2017][428423.8] Warning: writing 0 byte file (/the_directory/) to tar archive
[Thu Feb 2 18:55:27 2017][449547.25] Warning: writing 0 byte file (/the_directory/) to tar archive
[Fri Feb 3 12:21:33 2017][451135.3] (file_name:0xcb61) Invalid variable type
call stack:
-----------
[0cb61:+33] larray, r#26, fp(3),
[031ff:Mug::Request.preHandlers+17] refcall, fp(1), string#245, # from: fp(1)
[0339d:Mug::Request.process+77] call, addr(0x80001d), -, # Mug::Request.preHandlers()
[02ffd:Mug::Request.recv+93] call, addr(0x800026), -, # Mug::Request.process()
[02d03:Mug::Connection.on_client+101] refcall, fp(0), string#734, # from: fp(0)
[14a5b:+4] refcall, fp(-2), string#3103, # from: fp(-2)
[1e24a:main+9664] eop, -, -,
正如您在上面所看到的,上面的文字并不适合任何类型的模式,并且有一些错误会抛出空白换行符,而有些则不会。理想情况下,我最终想要的就是这样......
[[Thu Feb 2 14:09:07 2017][428423.8] Warning: writing 0 byte file (/the_directory/) to tar archive], [Thu Feb 2 12:45:38 2017][428423.3] (file_name:0xcb61) Invalid variable type \ncall stack:\n-----------\n[0cb61:+33] larray, r#26, fp(3),\n[031ff:Mug::Request.preHandlers+17] refcall, fp(1), string#245, # from: fp(1)\n[0339d:Mug::Request.process+77] call, addr(0x80001d), -, # Mug::Request.preHandlers()\n[02ffd:Mug::Request.recv+93] call, addr(0x800026), -, # Mug::Request.process()\n[02d03:Mug::Connection.on_client+101] refcall, fp(0), string#734, # from: fp(0)\n[14a5b:+4] refcall, fp(-2), string#3103, # from: fp(-2)\n[1e24a:main+9664] eop, -, -,]
然后我可以通过循环访问每个错误。现在我正在使用一些正则表达式来接近这个购买来过滤已知的好数据,然后只是扔掉了调用堆栈,但是我希望能够存储整个调用堆栈,如果尽可能。
这是我目前的代码:
with open(local_dump, 'r') as ifile:
for line in ifile:
filename_pattern = re.compile(r'\((\w*\.\w*)\:\w*\)\s(.*$)')
date_pattern = re.compile(r"^\[([a-zA-z]{3,})\s([a-zA-z]{3,})\s{2}(\d{1,2})\s(\d{1,2}\:\d{1,2}\:\d{1,2})\s(\d{4})\]\[\d*\.\d*\]\s(.*$)")
if re.search(date_pattern, line):
data = re.search(date_pattern, line)
if re.search(filename_pattern, (data[6])):
data = re.search(filename_pattern, (data[6]))
print("{0}: {1}".format(data.group(1),data.group(2)))
else:
if re.search("call stack", line.strip()):
print(line.strip())
我能够通过这段代码实现这一功能:
with open(local_dump, 'r') as ifile:
lines = ifile.read()
for line in lines.split('\n\n'):
print("LINE: " + line)
上面的代码确实将调用堆栈分成了自己的行,但是当行以' \ n'
结束时,我遇到了问题。LINE: [Thu Feb 2 12:45:38 2017][428423.3] (file_name:0xcb61) Invalid variable type
call stack:
-----------
[0cb61:+33] larray, r#26, fp(3),
[031ff:Mug::Request.preHandlers+17] refcall, fp(1), string#245, # from: fp(1)
[0339d:Mug::Request.process+77] call, addr(0x80001d), -, # Mug::Request.preHandlers()
[02ffd:Mug::Request.recv+93] call, addr(0x800026), -, # Mug::Request.process()
[02d03:Mug::Connection.on_client+101] refcall, fp(0), string#734, # from: fp(0)
[14a5b:+4] refcall, fp(-2), string#3103, # from: fp(-2)
[1e24a:main+9664] eop, -, -,
LINE: [Thu Feb 2 14:09:07 2017][428423.8] Warning: writing 0 byte file (/the_directory/) to tar archive
[Thu Feb 2 18:55:27 2017][449547.25] Warning: writing 0 byte file (/the_directory/) to tar archive
[Fri Feb 3 12:21:33 2017][451135.3] (file_name:0xcb61) Invalid variable type
call stack:
-----------
[0cb61:+33] larray, r#26, fp(3),
[031ff:Mug::Request.preHandlers+17] refcall, fp(1), string#245, # from: fp(1)
[0339d:Mug::Request.process+77] call, addr(0x80001d), -, # Mug::Request.preHandlers()
[02ffd:Mug::Request.recv+93] call, addr(0x800026), -, # Mug::Request.process()
[02d03:Mug::Connection.on_client+101] refcall, fp(0), string#734, # from: fp(0)
[14a5b:+4] refcall, fp(-2), string#3103, # from: fp(-2)
[1e24a:main+9664] eop, -, -,
以下是文本的原始格式:
'[Thu Feb 2 14:09:07 2017][428423.8] Warning: writing 0 byte file (/the_directory/) to tar archive \n[Thu Feb 2 18:55:27 2017][449547.25] Warning: writing 0 byte file (/the_directory/) to tar archive \n[Fri Feb 3 12:21:33 2017][451135.3] (file_name:0xcb61) Invalid variable type \ncall stack:\n-----------\n[0cb61:+33] larray, r#26, fp(3), \n[031ff:Mug::Request.preHandlers+17] refcall, fp(1), string#245, # from: fp(1)\n[0339d:Mug::Request.process+77] call, addr(0x80001d), -, # Mug::Request.preHandlers()\n[02ffd:Mug::Request.recv+93] call, addr(0x800026), -, # Mug::Request.process()\n[02d03:Mug::Connection.on_client+101] refcall, fp(0), string#734, # from: fp(0)\n[14a5b:+4] refcall, fp(-2), sting#3103, # from: fp(-2)\n[1e24a:main+9664] eop, -, -, '
感谢您提供的任何提示,技巧和帮助。
答案 0 :(得分:2)
您可以在\n
上拆分,然后删除空行。
input = "your input"
list = input.split("\n")
list = filter(None, list)
如果您只想从日志中获取所有错误消息,可以尝试:
matches = re.finditer(r"\[.*?\]\[.*\]\s*(.*)$", input, re.MULTILINE)
for match in matches:
print("Error: " + match.group(1))
假设所有错误都有两个[...]
组