使用正则表达式复制Python文件

时间:2014-01-31 15:44:55

标签: python regex file exception copying

我有一个大型日志文件。我想提取包含java/javax/or/com后跟./:的行。对于这样的每一行,我想提取一些相应的行,这些行是堆栈跟踪并以at开头。例如:

Line1: java.line.something.somethingexception
line 2: at something something
line 3: at something something
line 4: at something something

line 5-20:Junk I don't want to extract.
line 21: javax.line.something.somethingexception
line 22: at something something
line 23: at something something
line 24: at something something

依旧......

这里我要复制第1-4行,然后再复制第21-24行。到目前为止,我的代码收集了包含关键字的行但是我无法弄清楚如何在此之后编写特定的行数,跳过几行并再次开始写。从中开始的这些行是随机的,即它们可以是100行或者它们可以是250行,所以没有模式。

这是我的代码:

import re
import sys
from itertools import islice

file = open(sys.argv[1], "r")
file1 = open(sys.argv[2],"w")
i = 0
for line in file:
    if re.search(r'[java|javax|org|com]+?[\.|:]+?', line, re.I) and not (re.search(r'at\s', line, re.I) or re.search(r'mdcloginid:|webcontainer|c\.h\.i\.h\.p\.u\.e|threadPoolTaskExecutor|caused\sby', line, re.I)):
          file1.write(line)

此代码仅提取包含关键字的行,但我仍然坚持如何进行下一部分,即复制包含at的下一行并将其写入新文件,停止在'at'结束处。搜索包含关键字的下一行并再次执行相同的操作。

2 个答案:

答案 0 :(得分:1)

这可以通过您设置的标志来解决,以防您符合特定条件:

java_regex = re.compile(...)  # java 
at_regex = re.compile(...)    # at

copy = False  # flag that control to copy or to not copy to output

for line in file_in:
   if re.search(java_regex, line):
       # start copying if "java" is in the input
       copy = True
   else:
       if copy and not re.search(at_regex, line):
           # stop copying if "at" is not in the input
           copy = False

   if copy:
       file_out.write(line)

答案 1 :(得分:1)

设置一个标志,指示您正在处理的行是否在异常块中:

import re
import sys
from itertools import islice

file = open(sys.argv[1], "r")
file1 = open(sys.argv[2],"w")
i = 0
ex = False
for line in file:
    if re.search(r'[java|javax|org|com]+?[\.|:]+?', line, re.I) and not (re.search(r'at\s', line, re.I) or re.search(r'mdcloginid:|webcontainer|c\.h\.i\.h\.p\.u\.e|threadPoolTaskExecutor|caused\sby', line, re.I)):
          file1.write(line)
          ex = True
    elif ex:
          if line.startswith('at'):
              file1.write(line)
          else:
              ex = False