使用Pythons的正则表达式提取多个字符串

时间:2015-01-19 02:25:54

标签: python regex

我有一个具有以下输出的日志文件,我已将其缩短为数千行:

Time = 1

smoothSolver:  Solving for Ux, Initial residual = 0.230812, Final residual = 0.0134171, No Iterations 2
smoothSolver:  Solving for Uy, Initial residual = 0.283614, Final residual = 0.0158797, No Iterations 3
smoothSolver:  Solving for Uz, Initial residual = 0.190444, Final residual = 0.016567, No Iterations 2
GAMG:  Solving for p, Initial residual = 0.0850116, Final residual = 0.00375608, No Iterations 3
time step continuity errors : sum local = 0.00999678, global = 0.00142109, cumulative = 0.00142109
smoothSolver:  Solving for omega, Initial residual = 0.00267604, Final residual = 0.000166675, No Iterations 3
bounding omega, min: -26.6597 max: 18468.7 average: 219.43
smoothSolver:  Solving for k, Initial residual = 1, Final residual = 0.0862096, No Iterations 2
ExecutionTime = 4.84 s  ClockTime = 5 s

Time = 2

smoothSolver:  Solving for Ux, Initial residual = 0.0299872, Final residual = 0.00230507, No Iterations 2
smoothSolver:  Solving for Uy, Initial residual = 0.145767, Final residual = 0.00882969, No Iterations 3
smoothSolver:  Solving for Uz, Initial residual = 0.0863129, Final residual = 0.00858536, No Iterations 2
GAMG:  Solving for p, Initial residual = 0.394189, Final residual = 0.0175138, No Iterations 3
time step continuity errors : sum local = 0.00862823, global = 0.00212477, cumulative = 0.00354587
smoothSolver:  Solving for omega, Initial residual = 0.00258475, Final residual = 0.000222705, No Iterations 3
smoothSolver:  Solving for k, Initial residual = 0.112805, Final residual = 0.0054572, No Iterations 3
ExecutionTime = 5.9 s  ClockTime = 6 s

Time = 3

smoothSolver:  Solving for Ux, Initial residual = 0.128298, Final residual = 0.0070293, No Iterations 2
smoothSolver:  Solving for Uy, Initial residual = 0.138825, Final residual = 0.0116437, No Iterations 3
smoothSolver:  Solving for Uz, Initial residual = 0.0798979, Final residual = 0.00491246, No Iterations 3
GAMG:  Solving for p, Initial residual = 0.108748, Final residual = 0.00429273, No Iterations 2
time step continuity errors : sum local = 0.0073211, global = -0.00187909, cumulative = 0.00166678
smoothSolver:  Solving for omega, Initial residual = 0.00238456, Final residual = 0.000224435, No Iterations 3
smoothSolver:  Solving for k, Initial residual = 0.0529661, Final residual = 0.00280851, No Iterations 3
ExecutionTime = 6.92 s  ClockTime = 7 s

我需要使用Python的正则表达式提取Time = 1,2,3和相应的累积值。更准确地说,我只需要提取值1,2,3和0.00142109,0.00354587,0.00166678,它们对应于时间= 1,2和3的累积值,并写入另一个文件。

目前,这就是我所拥有的:

contCumulative_0_out = open('contCumulative_0', 'w+')

with open(logFile, 'r') as logfile_read:
for line in logfile_read:
    line = line.rstrip()
    iteration_time = re.findall(r'^Time = ([0-9]+)', line)
    print iteration_time
    contCumulative_0 = re.search(r'cumulative = ((\d|.)+)', line)
    if contCumulative_0:        
        cumvalue = contCumulative_0.groups(1)
        contCumulative_0_out.write('\n'.join(cumvalue))

变量iteration_time获取Time值,但是在下一个后续if循环中不可用,因此我无法将Time和cumulative结合起来在输出文件中给出1 0.00142109。

3 个答案:

答案 0 :(得分:1)

您的代码在iteration_time循环的每次迭代中都在for上进行写入。那就是问题所在。在成功填充第一个查找后,您需要停止填充此变量。

要执行此操作,请在for循环中对iteration_time进行测试,并且仅在不存在或None正在进行时间搜索时进行正则表达式搜索。你可以这样做:

contCumulative_0_out = open('contCumulative_0', 'w+')

with open(logFile, 'r') as logfile_read:
    iteration_time = None
    for line in logfile_read:
        line = line.rstrip()
        time_match = re.findall(r'^Time = ([0-9]+)', line)
        if time_match:
            iteration_time = time_match
            print iteration_time
        else:  # Because if there is time_match, there is no 'cumulative = ...'
            contCumulative_0 = re.search(r'cumulative = ((\d|.)+)', line)
            if contCumulative_0:        
                cumvalue = contCumulative_0.groups(1)
                # You can check and use iteration_time here
                contCumulative_0_out.write('\n'.join(cumvalue))

希望这有帮助。

答案 1 :(得分:1)

没有时间'或者'累积'在这一行中,不需要覆盖该变量。你可以这样做:

...
with open(logFile, 'r') as logfile_read:
for line in logfile_read:
    line = line.rstrip()
    if 'Time' in line:
        iteration_time = re.findall(r'^Time = ([0-9]+)', line)
        print iteration_time
    if 'cumulative' in line:
        contCumulative_0 = re.search(r'cumulative = ((\d|.)+)', line)
        if contCumulative_0:
            cumvalue = contCumulative_0.groups(1)
            contCumulative_0_out.write('\n'.join(cumvalue))
...

答案 2 :(得分:1)

您可以使用正则表达式执行此操作,假设您的所有条目的日志格式都相同。正在发生的事情的解释如下:

import re

s = """Time = 1

smoothSolver:  Solving for Ux, Initial residual = 0.230812, Final residual = 0.0134171, No Iterations 2
smoothSolver:  Solving for Uy, Initial residual = 0.283614, Final residual = 0.0158797, No Iterations 3
smoothSolver:  Solving for Uz, Initial residual = 0.190444, Final residual = 0.016567, No Iterations 2
GAMG:  Solving for p, Initial residual = 0.0850116, Final residual = 0.00375608, No Iterations 3
time step continuity errors : sum local = 0.00999678, global = 0.00142109, cumulative = 0.00142109
smoothSolver:  Solving for omega, Initial residual = 0.00267604, Final residual = 0.000166675, No Iterations 3
bounding omega, min: -26.6597 max: 18468.7 average: 219.43
smoothSolver:  Solving for k, Initial residual = 1, Final residual = 0.0862096, No Iterations 2
ExecutionTime = 4.84 s  ClockTime = 5 s

Time = 2

smoothSolver:  Solving for Ux, Initial residual = 0.230812, Final residual = 0.0134171, No Iterations 2
smoothSolver:  Solving for Uy, Initial residual = 0.283614, Final residual = 0.0158797, No Iterations 3
smoothSolver:  Solving for Uz, Initial residual = 0.190444, Final residual = 0.016567, No Iterations 2
GAMG:  Solving for p, Initial residual = 0.0850116, Final residual = 0.00375608, No Iterations 3
time step continuity errors : sum local = 0.00999678, global = 0.00142109, cumulative = 0.00123456
smoothSolver:  Solving for omega, Initial residual = 0.00267604, Final residual = 0.000166675, No Iterations 3
bounding omega, min: -26.6597 max: 18468.7 average: 219.43
smoothSolver:  Solving for k, Initial residual = 1, Final residual = 0.0862096, No Iterations 2
ExecutionTime = 4.84 s  ClockTime = 5 s
"""

regex = re.compile("^Time = (\d+?).*?cumulative = (\d{0,10}\.\d{0,10})",re.DOTALL|re.MULTILINE)

for x in re.findall(regex,s):
    print "{} => {}".format(x[0], x[1])

这会输出两个结果(因为我添加了两个日志条目,而不仅仅是您提供的日志条目):

1 => 0.00142109
2 => 0.00123456

发生了什么事?

正在使用的RegEx是:

^Time = (\d+?).*?cumulative = (\d{0,10}\.\d{0,10})

此正则表达式在行的开头查找您的Time =字符串,并匹配后面的数字。然后它对字符串cumulative =执行非贪婪匹配,并捕获其后的数字。非贪婪很重要,否则您只会在整个日志中获得一个结果,因为它与Time =的第一个实例和cumulative =的最后一个实例相匹配。

然后打印每个结果。每个捕获的结果包含时间值和累积值。如果需要,可以修改此部分代码以打印到文件。

此正则表达式适用于多行,因为它使用两个标记:DOTALLMULTILINE