Question

我有一个日志文件，我想用matplotlib解析和绘图。跳过前6行后，我感兴趣的数据。例如我的日志文件如下所示：

# 2014-05-09 17:51:50,473 - root - INFO - Epoch = 1, batch = 216, Classif Err = 52.926, lg(p) -1.0350
# 2014-05-09 17:51:53,749 - root - INFO - Test set error = 37.2317

我想为每个纪元制作一个Classif Err vs Test set error的图。

我的第一次尝试：

import numpy
from numpy import *
from pylab import *

f1 = open('log.txt', 'r')
FILE = f1.readlines()
f1.close()

for line in FILE:
    line = line.strip()
    if ('Epoch' in line):
        epoch += line.split('Epoch = ')
    elif('Test set error' in line):
        test_err += line.split('Test set error = ')

我看到了这个错误：

Traceback (most recent call last):
  File "logfileparse.py", line 18, in <module>
    epoch += line.split('Epoch = ')
NameError: name 'epoch' is not defined

Answer 1

这将找到Epoch及其值，并将其附加到列表中。

epoch=[] # define epoch
with open('log.txt', 'r') as f: #  use with to open files as it automatically closes the file
    for line in f:
        if "Epoch" in line:
            epoch.append(line[line.find("Epoch ="):].split(',')[0])
        elif('Test set error' in line):
            test_error.append(line[line.find("Test set error ="):].split(',')[0]) 
print epoch
['Epoch = 1']
print test_error
['Test set error = 37.2317']

使用＆＃34; Epoch＆＃34;的索引切割字符串，拆分＆＃39;，＆＃39;并附加第一个元素＆＃34; Epoch = ...＆＃34; 到了纪元清单。

Answer 2

当我更多地尝试您的代码时，我发现在您没有定义epoch变量之后还有另一个问题。而且我的意思是你试图将list对象连接到string对象，就像你的代码向我们展示的那样！我试图验证这段代码并得到类似的东西：

epoch = []
for line in f1.readlines():
    line_list = line.split(' ')
    if 'Epoch' in line_list:
        epoch_index = line_list.index('Epoch')
        message = ' '.join(line_list[epoch_index:])
        epoch.append(message)
    elif 'Test set error' in line_list:
        error_index = line_list.index('Test set error')
        message = ' '.join(line_list[error_index:])
        epoch.append(message)

Answer 3

我想你需要得到一组时代和测试集错误来绘制它们。假设错误行总是在'epoch'的行之后，请尝试：

data_points = []
ep = 'Epoch = (\d+), batch = \d+, Classif Err = (\d+\.?\d+)'

with open('file.txt') as f:
    for line in f:
       epoch = re.findall(ep, line)
       if epoch:
           error_line = next(f) # grab the next line, which is the error line
           error_value = error_line[error_line.rfind('=')+1:]
           data_points.append(map(float,epoch[0]+(error_value,)))

现在data_points将是一个列表列表，第一个值是纪元，第二个是classif err值，第三个是错误值。

正则表达式将返回带有元组的列表：

>>> re.findall(ep, i)
[('1', '52.926')]

此处i是您的第一行

要获取错误代码，请找到最后一个=，然后错误代码为剩余字符：

>>> i2 = '# 2014-05-09 17:51:53,749 - root - INFO - Test set error = 37.2317'
>>> i2[i2.rfind('=')+1:]
' 37.2317'

我使用map(float,epoch[0]+(error_value,))将字符串中的值转换为浮点数：

>>> map(float, re.findall(ep, i)[0]+(i2[i2.rfind('=')+1:],))
[1.0, 52.926, 37.2317]

Answer 4

您没有初始化变量纪元。你之前做的很重要：

epoch + = line.split（＆＃39; Epoch =＆＃39;）

使用Python解析文本文件

4 个答案: