从文本中提取细节并将它们写在单独的文件中

时间:2016-02-28 20:00:17

标签: python regex

以下是example.txt,其中包含许多队列的统计信息。 我的目标是提取单个队列的详细信息(来自example.txt)并将它们放在单独的日志文件中(queue_0.log,queue_1.log等)。

some details:

sn  size    fbe     lbe     fbl     lbl     latency
log_rx_packets_start_queue 0
0   512     1.6     3.2     3.2     4.8     1.6
1   512     3.2     4.8     4.8     6.4     1.6
.
.
97  512     156.8   158.4   158.4   160     1.59999999999999
98  512     158.4   160     160     161.6   1.59999999999999
99  512     160     161.6   161.6   163.2   1.59999999999999
log_rx_packets_end_queue 0

************************************
--- Received Packet Statistics --- For Queue 0
***********************************

Number of packets sent : 100
Number of packets received : 100


log_rx_packets_start_queue 1
0   512     161.6   163.2   163.2   164.8   1.59999999999999
1   512     163.2   164.8   164.8   166.4   1.59999999999999
2   512     164.8   166.4   166.4   168     1.59999999999999
.
.
98  512     318.4   320     320     321.6   1.60000000000002
99  512     320     321.6   321.6   323.2   1.60000000000002
log_rx_packets_end_queue 1

************************************
--- Received Packet Statistics --- For Queue 1
***********************************

Number of packets sent : 100
Number of packets received : 100


log_rx_packets_start_queue 2
0   512     321.6   323.2   323.2   324.8   1.60000000000002
1   512     323.2   324.8   324.8   326.4   1.60000000000002
.
.
99  512     480.000000000003        481.600000000003        481.600000000003        483.200000000003        1.60000000000002
log_rx_packets_end_queue 2

************************************
--- Received Packet Statistics --- For Queue 2
***********************************

Number of packets sent : 100
Number of packets received : 100

...
...
// Similarly continues

目前我已成功获取一个队列(queue_0.log)的日志。任何人都可以就如何扩展它并获取所有队列的日志并将它们放在单独的目录中给出一些想法。

import os
import re

newpath = 'results'
if not os.path.exists(newpath):
    os.makedirs(newpath) # need to check exception here


#input_file = open("log.txt", "r")
qno_start = 0
qno_end = 0
grab_lines = False
filename='output'

with open('example.txt','r') as ip_file:
   print "Reading"
   data = []
   for line in ip_file:
       match_qno_start = re.match(r'\s*log_rx_packets_start_queue ([0_9])', line)
       match_qno_end = re.match(r'\s*log_rx_packets_end_queue ([0_9])', line)

       if match_qno_start:
           qno_start = match_qno_start.group(0)
           grab_lines = True
           output_name='queue_'+match_qno_start.group(1)+'.log'
           output_file = open(output_name, "w")
           continue


       elif match_qno_end:  
           qno_end = match_qno_end.group(0)
           grab_lines = False

       if grab_lines:
           new_line = data.append(line);

for i in data:
    output_file.write(i)

1 个答案:

答案 0 :(得分:1)

这可以使用正则表达式完成,如下所示:

import re

with open('example.txt') as f_input:
    data = f_input.read()

    for match in re.finditer(r'log_rx_packets_start_queue (\d+)\n(.*?)log_rx_packets_end_queue', data, re.M + re.S):
        queue, block =  match.groups()

        with open('queue_{}.log'.format(queue), 'w') as f_output:
            f_output.write(block)

对于每个块,它会根据队列名称创建一个日志文件。它确实假设您的数据文件适合内存。例如,您的第一个输出日志文件将显示为:

0   512     1.6     3.2     3.2     4.8     1.6
1   512     3.2     4.8     4.8     6.4     1.6
97  512     156.8   158.4   158.4   160     1.59999999999999
98  512     158.4   160     160     161.6   1.59999999999999
99  512     160     161.6   161.6   163.2   1.59999999999999