我有一个大型日志文件。我想提取包含java/javax/or/com
后跟./:
的行。对于这样的每一行,我想提取一些相应的行,这些行是堆栈跟踪并以at
开头。例如:
Line1: java.line.something.somethingexception
line 2: at something something
line 3: at something something
line 4: at something something
line 5-20:Junk I don't want to extract.
line 21: javax.line.something.somethingexception
line 22: at something something
line 23: at something something
line 24: at something something
依旧......
这里我要复制第1-4行,然后再复制第21-24行。到目前为止,我的代码收集了包含关键字的行但是我无法弄清楚如何在此之后编写特定的行数,跳过几行并再次开始写。从中开始的这些行是随机的,即它们可以是100行或者它们可以是250行,所以没有模式。
这是我的代码:
import re
import sys
from itertools import islice
file = open(sys.argv[1], "r")
file1 = open(sys.argv[2],"w")
i = 0
for line in file:
if re.search(r'[java|javax|org|com]+?[\.|:]+?', line, re.I) and not (re.search(r'at\s', line, re.I) or re.search(r'mdcloginid:|webcontainer|c\.h\.i\.h\.p\.u\.e|threadPoolTaskExecutor|caused\sby', line, re.I)):
file1.write(line)
此代码仅提取包含关键字的行,但我仍然坚持如何进行下一部分,即复制包含at的下一行并将其写入新文件,停止在'at'结束处。搜索包含关键字的下一行并再次执行相同的操作。
答案 0 :(得分:1)
这可以通过您设置的标志来解决,以防您符合特定条件:
java_regex = re.compile(...) # java
at_regex = re.compile(...) # at
copy = False # flag that control to copy or to not copy to output
for line in file_in:
if re.search(java_regex, line):
# start copying if "java" is in the input
copy = True
else:
if copy and not re.search(at_regex, line):
# stop copying if "at" is not in the input
copy = False
if copy:
file_out.write(line)
答案 1 :(得分:1)
设置一个标志,指示您正在处理的行是否在异常块中:
import re
import sys
from itertools import islice
file = open(sys.argv[1], "r")
file1 = open(sys.argv[2],"w")
i = 0
ex = False
for line in file:
if re.search(r'[java|javax|org|com]+?[\.|:]+?', line, re.I) and not (re.search(r'at\s', line, re.I) or re.search(r'mdcloginid:|webcontainer|c\.h\.i\.h\.p\.u\.e|threadPoolTaskExecutor|caused\sby', line, re.I)):
file1.write(line)
ex = True
elif ex:
if line.startswith('at'):
file1.write(line)
else:
ex = False